Vectorization Codes:
vpic2
vbpic2
vpic3
vbpic3
The 2D codes illustrate how to use vectorization with the Intel Processors. Two approaches are illustrated. One uses the Intel SSE2 vector intrinsics, which is a low level data parallel language closely related to the native assembly instructions. This gives the best performance but requires substantial effort and expertise. The other approach uses compiler directives and often requires reorganization of the data structures and loops, but is much simpler.
For the 2D electrostatic:
no-vec = 35 nsec/particle/timestep
compiler vec = 18 nsec/particle/timestep
SSE2 = 12 nsec/particle/timestep
For the 2-1/2D electromagnetic:
no-vec = 100 nsec/particle/timestep
compiler vec = 60 nsec
SSE2 = 34 nsec
With SSE2 intrinsics one typically obtains about 3x speedup compared to no vectorization. Compiler vectorization achieves about 2x speedup. (All timings are on a 2.67GHz Intel Nehalem processor.)
1. 2D Vector Electrostatic Spectral code: vpic2
2. 2-1/2D Vector Electromagnetic Spectral code: vbpic2
The following 3D codes illustrate how to use vectorization with the Intel PHI Coprocessors. Two approaches are illustrated. One uses the Intel Knight’s Corner (KNC) MIC vector intrinsics, which is a low level data parallel language closely related to the native assembly instructions. This gives the best performance but requires substantial effort and expertise. The other approach uses compiler directives and often requires reorganization of the data structures and loops, but is much simpler. Only a single core of the PHI is used.
For the 3D electrostatic:
no-vec = 547 nsec/particle/timestep
compiler vec = 264 nsec/particle/timestep
KNC = 198 nsec/particle/timestep
For the 3D electromagnetic:
no-vec = 1031 nsec/particle/timestep
compiler vec = 589 nsec/particle/timestep
KNC = 469 nsec/particle/timestep
3. 3D Vector Electrostatic Spectral code: vpic3
4. 3D Vector Electromagnetic Spectral code: vbpic3
Want to contact developer?