PICKSC

Particle-in-Cell Kinetic Simulation Software Center

UCLA Logo
Particle-in-Cell and Kinetic Simulation Software Center
Funded by NSF and SciDac
  • News
    • PICKSC News
    • Collaborators’ News
    • PICKSC Results
    • Software Releases
  • People
  • Publications
    • Overview
    • PICKSC Members’ Publications
    • Reports and Notes
    • Presentations
  • Software
    • Overview
    • Production Codes
      • Overview
      • OSIRIS
        • OSIRIS WIKI
      • QuickPIC
      • UPIC-EMMA
      • OSHUN
    • Skeleton Codes
      • Overview
      • Serial
      • QuickStart
      • OpenMP
      • Vectorization
      • MPI
      • Coarray Fortran
      • OpenMP/MPI
      • OpenMP/Vectorization
      • GPU
    • UPIC Framework
    • Gridless Particle Codes
    • Educational Software
      • Overview
      • JupyterPIC
      • Particle Orbit Visualization
      • Python-PIC-GUI
      • ZPIC
    • Fortran 2003 Techniques
  • Research
    • Overview
    • High-Performance Computing
    • Plasma Based Acceleration
    • Nonlinear Optics of Plasmas
  • Engagement
    • Workshops
    • Opportunities
You are here: Home / Software / Skeleton Code / Vectorization

Vectorization

Production Codes  |  Skeleton Codes  :  Serial | QuickStart | OpenMP | Vectorization | MPI | Coarray Fortran | OpenMP/MPI | OpenMP/Vectorization | GPU    |    UPIC Framework  |  Educational Software  |  Fortran 2003 Techniques

Vectorization Codes:

vpic2
vbpic2
vpic3
vbpic3

The 2D codes illustrate how to use vectorization with the Intel Processors. Two approaches are illustrated. One uses the Intel SSE2 vector intrinsics, which is a low level data parallel language closely related to the native assembly instructions. This gives the best performance but requires substantial effort and expertise. The other approach uses compiler directives and often requires reorganization of the data structures and loops, but is much simpler.

For the 2D electrostatic:
no-vec = 35 nsec/particle/timestep
compiler vec = 18 nsec/particle/timestep
SSE2 = 12 nsec/particle/timestep

For the 2-1/2D electromagnetic:
no-vec = 100 nsec/particle/timestep
compiler vec = 60 nsec
SSE2 = 34 nsec

With SSE2 intrinsics one typically obtains about 3x speedup compared to no vectorization. Compiler vectorization achieves about 2x speedup. (All timings are on a 2.67GHz Intel Nehalem processor.)

1. 2D Vector Electrostatic Spectral code:  vpic2
2. 2-1/2D Vector Electromagnetic Spectral code:  vbpic2

The following 3D codes illustrate how to use vectorization with the Intel PHI Coprocessors. Two approaches are illustrated. One uses the Intel Knight’s Corner (KNC) MIC vector intrinsics, which is a low level data parallel language closely related to the native assembly instructions. This gives the best performance but requires substantial effort and expertise. The other approach uses compiler directives and often requires reorganization of the data structures and loops, but is much simpler. Only a single core of the PHI is used.

For the 3D electrostatic:
no-vec = 547 nsec/particle/timestep
compiler vec = 264 nsec/particle/timestep
KNC = 198 nsec/particle/timestep

For the 3D electromagnetic:
no-vec = 1031 nsec/particle/timestep
compiler vec = 589 nsec/particle/timestep
KNC = 469 nsec/particle/timestep

3. 3D Vector Electrostatic Spectral code: vpic3
4. 3D Vector Electromagnetic Spectral code: vbpic3

Want to contact developer?

Send mail to Viktor Decyk – decyk@physics.ucla.edu 

© 2014 UC REGENTS TERMS OF USE & PRIVACY POLICY

  1. HOME
  2. NEWS
  3. PEOPLE
  4. PUBLICATIONS
  5. RESEARCH
  6. SOFTWARE
  7. OPPORTUNITIES