This lesson is being piloted (Beta version)

ACENET Summer School - General: Glossary

Key Points

Introduction
  • Parallel computing is much better suited for modelling, simulating and understanding complex, real-world phenomena.

  • Modern computers have several levels of parallelism

Parallel Computers
  • Parallel computers follow basic von Neumann’s CPU design.

  • Parallel computers can be divided into 4 groups based on the number of instruction and data streams.

Memory Organisations of Parallel Computers
  • The amount of information that must be shared by parallel tasks is one of the key parameters dictating the choice of the memory model.

Parallel Programming Models
  • There are many layers of parallelism in modern computer systems

  • An application can implement vectorization, multithreading and message passing

Independent Tasks and Job Schedulers
Parallel Performance and Scalability
  • An increase of the number of processors leads to a decrease of efficiency.

  • The increase of problem size causes an increase in efficiency.

  • The parallel problem can be solved efficiently by increasing the number of processors and the problem size simultaneously.

Input and Output
Analyzing Performance Using a Profiler
  • Don’t start to parallelize or optimize your code without having used a profiler first.

  • A programmer can easily spend many hours of work “optimizing” a part of the code which eventually speeds up the program by only a minuscule amount.

  • When viewing the profiler report, look for areas where the largest amounts of CPU time are spent, working your way down.

  • Pay special attention to areas that you didn’t expect to be slow.

  • In some cases one can achieve a 10x (or more) speedup by understanding the intrinsics of the language of choice.

Thinking in Parallel
  • Efficiently parallelizing a serial code needs some careful planning and comes with an overhead.

  • Shorter independent tasks need more overall communication.

  • Longer tasks can cause other resources to be left unused.

  • Large variations of tasks-lengths can cause resources to be left unused, especially if the length of a task cannot be approximated upfront.

  • There are many textbooks and publications that describe different parallel algorithms. Try finding existing solutions for similar problems.

  • Domain Decomposition can be used in many cases to reduce communication by processing short-range interactions locally.

Glossary

FIXME