PICKSC

Particle-in-Cell Kinetic Simulation Software Center

UCLA Logo
Particle-in-Cell and Kinetic Simulation Software Center
Funded by NSF and SciDac
  • News
    • PICKSC News
    • Collaborators’ News
    • PICKSC Results
    • Software Releases
  • People
  • Publications
    • Overview
    • PICKSC Members’ Publications
    • Reports and Notes
    • Presentations
  • Software
    • Overview
    • Production Codes
      • Overview
      • OSIRIS
        • OSIRIS WIKI
      • QuickPIC
      • UPIC-EMMA
      • OSHUN
    • Skeleton Codes
      • Overview
      • Serial
      • QuickStart
      • OpenMP
      • Vectorization
      • MPI
      • Coarray Fortran
      • OpenMP/MPI
      • OpenMP/Vectorization
      • GPU
    • UPIC Framework
    • Gridless Particle Codes
    • Educational Software
      • Overview
      • JupyterPIC
      • Particle Orbit Visualization
      • Python-PIC-GUI
      • ZPIC
    • Fortran 2003 Techniques
  • Research
    • Overview
    • High-Performance Computing
    • Plasma Based Acceleration
    • Nonlinear Optics of Plasmas
  • Engagement
    • Workshops
    • Opportunities
You are here: Home / Software / Skeleton Code / OpenMP/Vectorization

OpenMP/Vectorization

Production Codes  |  Skeleton Codes  :  Serial | QuickStart | OpenMP | Vectorization | MPI | Coarray Fortran | OpenMP/MPI | OpenMP/Vectorization | GPU    |    UPIC Framework  |  Educational Software  |  Fortran 2003 Techniques

OpenMP/Vectorization Codes:

vmpic2
vmpic3
vmbpic2
vmbpic3

These codes illustrate how to use hybrid shared memory/vectorization algorithm, with a tiled scheme on each shared memory multi-core node implemented with OpenMP and vectorization implemented with either SSE (for 2d) or KNC (for 3d) vector intrinsics and compiler vectorization. KNC refers to the Knight’s Corner Intel PHI. The tiling scheme is described in detail in Ref.[4]. The Intel SSE2 and KNC vector intrinsics are a low level data parallel language closely related to the native assembly instructions. The compiler vectorization uses compiler directives and often requires reorganization of the data structures and loops.

For the 2D electrostatic with 12 processing cores:
no-vec = 2.7 nsec/particle/timestep
compiler vec = 2.0 nsec/particle/timestep
SSE2 = 1.6 nsec/particle/timestep

For the 2-1/2D electromagnetic with 12 processing cores:
no-vec = 9.2 nsec/particle/timestep
compiler vec = 6.1 nsec/particle/timestep
SSE2 = 4.2 nsec/particle/timestep

With SSE2 intrinsics one typically obtains about 2x speedup compared to no vectorization. Compiler vectorization achieves about 1.5x speedup.

For the 3D electrostatic with 60 processing cores:
no-vec = 4.2 nsec/particle/timestep
compiler vec = 2.8 nsec/particle/timestep
KNC = 2.1 nsec/particle/timestep

For the 3D electromagnetic with 60 processing cores:
no-vec = 10.2 nsec/particle/timestep
compiler vec = 6.0 nsec/particle/timestep
KNC = 4.8 nsec/particle/timestep

With KNC intrinsics one typically obtains about 2x speedup compared to no vectorization. Compiler vectorization achieves about 1.5-1.7x speedup.

 

1. 2D Parallel Electrostatic Spectral code:  vmpic2
2. 3D Parallel Electrostatic Spectral code:  vmpic3
3. 2-1/2D Parallel Electromagnetic Spectral code:  vmbpic2
4. 3D Parallel Electromagnetic Spectral code:  vmbpic3

 

Want to contact developer?

Send mail to Viktor Decyk – decyk@physics.ucla.edu 

© 2014 UC REGENTS TERMS OF USE & PRIVACY POLICY

  1. HOME
  2. NEWS
  3. PEOPLE
  4. PUBLICATIONS
  5. RESEARCH
  6. SOFTWARE
  7. OPPORTUNITIES