2008 Annual Report: [Word|PDF]

S3D Tiger Team

The S3D Tiger Team has actively worked with the application team to identify several optimization opportunities and to provide mechanisms to exploit them. Specific changes include development of a simple transformation tool to optimize certain Fortran statements; the modified code is being added to the application team's source base. We have also targeted study of potential optimizations of the exp intrinsic function that will likely be adopted by the application team. Scaling studies on the ORNL system identified load balance issues arising from the XT3 nodes having lower memory bandwidth than the XT4 nodes. The Tiger Team designed a potential solution to this issue as well although the application team has decided not to adopt it since the ORNL machine will only have XT4 nodes in the near future. While the application team has successfully met its Joule requirements, the Tiger Team continues to work with them in order to prepare for full scale runs on the anticipated petascale platform at ORNL.

GTC Tiger Team

The GTC Tiger Team was formed to assist with performance optimization and scalability issues related to achieving the Joule goals for GTC of 1) running 50 percent faster on the model problem, and 2) running twice as long in simulation time for a fixed wallclock time.

GTC solves the gyro-averaged Vlasov-Poisson system of equations using the particle-in-cell (PIC) approach. In GTC, the main bottleneck is the charge deposition, or scatter operation, and this is also true for most particle codes. The scatter algorithm in GTC is more complex since one is dealing with fast gyrating particles for which motion is described by charged rings being tracked by their guiding center. Hand-tuning techniques such as common subexpression elimination, code movement, loop unrolling, and cache blocking were used to improve performance of the charge deposition routine by around 10 percent. These changes have been incorporated into the production version of the code.

New physics in the GTC-S version of the code allows the input of real experimental profiles and magnetic equilibria, introducing the important effects of shaped cross-section physics. The implementation of this new capability required extensive changes to the code, and the goal is now to bring GTC-S to the performance level of the original GTC and beyond, so as to be able to do global simulations of plasmas in the ITER fusion reactor, which will be coming on-line in about five years. Tiger Team participants have used cache modeling techniques to develop further optimizations to the charge deposition and other key GTC-S routines. These code transformations have been sent to the developers and are awaiting their evaluation. Tiger Team participants are also investigating scalability and load imbalance issues with the new GTC-P version of the code that partitions the poloidal plane into radial shells.