ISC'12

June 17-21, 2012

Hamburg, Germany

Contribution Details

 
Name: Large Scale Simulations
(17) Three-Dimensional Particle-In-Cell Plasma Simulation on Heterogeneous Computing Systems
 
Time: Monday, June 18, 2012
3:00 PM - 8:30 PM
 
Room:   Hall H, #911
CCH - Congress Center Hamburg
 
Speakers:   Sergei Bastrakov, Lobachevsky State University of Nizhni Novgorod
  Iosif Meyerov, Lobachevsky State University of Nizhni Novgorod
 
Abstract:   One of the currently high-demand areas of computational physics is simulation of plasma dynamics with the Particle-In-Cell (PIC) method, which in many cases necessitates use of supercomputers. Recently, there has been a growing interest in GPUs as a source of immense computational power. While both CPU codes (such as OSIRIS, QUICKPIC, VPIC, VLPL, UPIC) and GPU codes are currently being developed, both CPUs and GPUs on new supercomputers are powerful, raising the problem of developing heterogeneous codes. We present Picador — a three-dimensional PIC code for heterogeneous cluster systems, with the eventual goal of utilizing their computational resources. The simulation area is decomposed into domains, which are handled in parallel by separate processes; each process uses one or several CPU cores or a GPU. In the CPU implementation we use the widespread technique of ordering particles in memory to enable a cache-friendly memory access pattern. GPUs are utilized using OpenCL; we use a modification of the supercell approach used in some GPU PIC codes which allows high device occupancy and memory access coalescing. We’ve reached 85% efficiency on 2048 cores relative to 512 on the Akka CPU cluster. With weak scaling, execution time increases by 11% for 1280 cores relative to 512 and stays on that level for up to 2048 cores. Single CPU core achieves floating point throughput of ≈28% of the peak in double precision. On NVIDIA Tesla X2070 the GPU version demonstrates a 3-4x speedup relative to 8 CPU cores and achieves floating point throughput of ≈14% of the peak for both single and double precision. With weak scaling on the Lomonosov cluster, execution time for 128 GPUs increases by 50% related to 16 GPUs and stays around this level up to 512 GPUs.  
  • Tutorial Pass
  • HPC in Asia Workshop Pass
  • Conference Pass
  • Conference Pass or Exhibition Pass
    Satellite Event marked with * requires separate pass
  • Morning & Afternoon Coffee Breaks
    Midday Lunch Break
Program may be subject to changes.