| Introduction | 
        
        Parallel computing is much better suited for modelling, simulating and understanding complex, real-world phenomena.Modern computers have several levels of parallelism | 
  
  
  
    
      | Parallel Computers |  | 
  
  
  
    
      | Memory Organisations of Parallel Computers |  | 
  
  
  
    
      | Parallel Programming Models | 
        
        There are many layers of parallelism in modern computer systemsAn application can implement vectorization, multithreading and message passing | 
  
  
  
    
      | Independent Tasks and Job Schedulers |  | 
  
  
  
    
      | Parallel Performance and Scalability | 
        
        An increase of the number of processors leads to a decrease of efficiency.The increase of problem size causes an increase in efficiency.The parallel problem can be solved efficiently by increasing the number of processors and the problem size simultaneously. | 
  
  
  
    
      | Input and Output |  | 
  
  
  
    
      | Analyzing Performance Using a Profiler | 
        
        Don’t start to parallelize or optimize your code without having used a profiler first.A programmer can easily spend many hours of work “optimizing” a part of the code which eventually speeds up the program by only a minuscule amount.When viewing the profiler report, look for areas where the largest amounts of CPU time are spent, working your way down.Pay special attention to areas that you didn’t expect to be slow.In some cases one can achieve a 10x (or more) speedup by understanding the intrinsics of the language of choice. | 
  
  
  
    
      | Thinking in Parallel | 
        
        Efficiently parallelizing a serial code needs some careful planning and comes with an overhead.Shorter independent tasks need more overall communication.Longer tasks can cause other resources to be left unused.Large variations of tasks-lengths can cause resources to be left unused, especially if the length of a task cannot be approximated upfront.There are many textbooks and publications that describe different parallel algorithms.  Try finding existing solutions for similar problems.Domain Decomposition can be used in many cases to reduce communication by processing short-range interactions locally. |