PERI Publications

From PERI

Jump to: navigation, search

Contents


If you came here from the PERI web page's Publications tab, you can click here to return to the PERI web home page.

Authors: How-to add an entry

2010

Bradley Barnes, Jeonifer Garren, David K. Lowenthal, Jaxk Reeves, Bronis R. de Supinski, Martin Schulz and Barry Rountree. Using Focused Regression for Accurate Time-Constrained Scaling of Scientific Applications, Twenty Fourth International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, April 19–23, 2010.
Daniel Bedard, Min Yeol Lim, Robert Fowler, Allan Porterfield. PowerMon: Fine-Grained and Integrated Power Monitoring for Commodity Computer Systems, Proceedings IEEE Southeastcon 2010.
Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler and Daniel A. Reed. Clustering Performance Data Efficiently at Massive Scales, Twenty Fourth International Conference on Supercomputing (ICS 2010), Tsukuba, Japan, June 1–4, 2010.
Dong Li, Bronis R. de Supinski, Martin Schulz, Kirk Cameron and Dimitrios S. Nikolopoulos. Power-aware MPI Task Aggregation Prediction for High-End Computing Systems, Twenty Fourth International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, April 19–23, 2010.
Dong Li, Dimitrios S. Nikolopoulos, Kirk Cameron, Bronis R. de Supinski and Martin Schulz. Hybrid MPI/OpenMP Power-Aware Computing, Twenty Fourth International Parallel and Distributed Processing Symposium (IPDPS 2010), Atlanta, GA, April 19–23, 2010.
Chunhua Liao, Daniel J. Quinlan, Thomas Panas and Bronis R. de Supinski. A ROSE-based OpenMP 3.0 Research Compiler Supporting Multiple Runtime Libraries, Sixth International Workshop on OpenMP (IWOMP 2010), Tsukuba, Japan, June 14–16, 2010.
Anirban Mandal, Rob Fowler, Allan Porterfield. Modeling Memory Concurrency for Multi-Socket Multi-Core Systems, 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS2010).
Frank Mueller, Xing Wu, Martin Schulz, Todd Gamblin and Bronis R. de Supinski. ScalaTrace: Tracing, Analysis and Modeling of HPC Codes at Scale. Para 2010: State of the Art in Scientific and Parallel Computing, Reykjavík, Iceland, June 6-9, 2010.
Sri Hari Krishna Narayanan, Boyana Norris, and Paul D. Hovland. Generating performance bounds from source code. In Proceedings of the First International Workshop on Parallel Software Tools and Tool Infrastructures (PSTI 2010), 9 2010.
Robert Preissl, Bronis R. de Supinski, Martin Schulz, Daniel J. Quinlan, Dieter Kranzlmueller and Thomas Panas. Exploitation of Dynamic Communication Patterns through Static Analysis. 2010 International Conference on Parallel Processing (ICPP-10), San Diego, CA, September 13-16, 2010.
Robert Preissl, Martin Schulz, Dieter Kranzlmueller, Bronis R. de Supinski and Daniel J. Quinlan. Transforming MPI Source Code Based on Communication Patterns, Future Generation Computer Systems, Vol. 26, No. 1, January 2010, pp. 147-154.
Karan Singh, Matthew Curtis-Maury, Sally A. McKee, Filip Blagojevic, Dimitris S. Nikolopoulos, Bronis R. de Supinski and Martin Schulz. Comparing Scalability Prediction Strategies on an SMP of CMPs, Euro-Par 2010, Naples, Italy, August 31–September 3, 2010.
Nathan R. Tallent, John M. Mellor-Crummey, Allan Porterfield. Analyzing Lock Contention in Multithreaded Applications, PPoPP 2010.

2009

M. Adams, S. Ku, P. Worley, E. D'Azevedo, J. Cummings, and C-S. Chang. Scaling to 150K cores: recent algorithm and performance engineering developments enabling XGC1 to run at scale, Journal of Physics: Conference Series, 180 (2009) 012036. (Proceedings of SciDAC 2009, San Diego, CA, July 14-18, 2009.)
Doug H. Ahn. Overcoming Scalability Challenges for Tool Daemon Launching, 2008 International Conference on Parallel Processing (ICPP-08), Portland, OR, USA, Jan 2009.
S. Alam, R. Barrett, H. Jagode, J. Kuehn, S. Poole and R. Sankaran. Impact of Quad-core Cray XT4 System and Software Stack on Scientific Computation, IEEE IPDPS09, Rome, Italy, May 2009.
V. Bui, B. Norris, and L. C. McInnes. An Automated Component-Based Performance Experiment Environment, Proceedings of the 2009 Workshop on Component-Based High Performance Computing (CBHPC 2009), Nov 2009.
C-S. Chang, S. Ku, P. Diamond, M. Adams, R. Barreto, Y. Chen, J. Cummings, E. D'Azevedo, G. Dif-Pradalier, S. Ethier, L. Greengard, T. Hahm, F. Hinton, D. Keyes, S. Klasky, Z. Lin, J. Lofstead, G. Park, S. Parker, N. Podhorszki, K. Schwan, A. Shoshani, D. Silver, M. Wolf, H. Weitzner, P. Worley, E. Yoon, and D. Zorin. Whole-volume integrated gyrokinetic simulation of plasma turbulence in realistic diverted-tokamak geometry, Journal of Physics: Conference Series, 180 (2009) 012057. (Proceedings of SciDAC 2009, San Diego, CA, July 14-18, 2009.)
J. H. Chen, A. Choudhary, B. R. de Supinski, M. DeVries, E. R. Hawkes, S. Klasky, W. K. Liao, K. L. Ma, J. Mellor-Crummey, N. Podhorszki, R. Sankaran, S. Shende, and C. S. Yoo. Terascale direct numerical simulations of turbulent combustion using S3D, Computational Science & Discovery, 2 015001, 2009. PDF
B. R. de Supinski, S. Alam, D. H. Bailey, L. Carrington, C. Daley, A. Dubey, T. Gamblin, D. Gunter, P. D. Hovland, H. Jagode, K. Karavanic, G. Marin, J. Mellor-Crummey, S. Moore, B. Norris, L. Oliker, C. Olschanowsky, P. C. Roth, M. Schulz, S. Shende, A. Snavely, W. Spear, M. Tikir, J. Vetter, P. Worley, and N. Wright. Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team, Journal of Physics: Conference Series, 180 (2009) 012039.
R. Fowler, L. Adhianto, B. R. de Supinski, M. Fagan, T. Gamblin, M. Krentel, J. Mellor-Crummey, M. Schulz and N. Tallent. Frontiers of Performance Analysis on Leadership Class Systems, SciDAC 2009, San Diego, California, June 14 -18, 2009.
Todd Gamblin. Scalable Performance Measurement and Analysis, Ph.D. Dissertation, Department of Computer Science, University of North Carolina, 2009.
Mary Hall, Jacqueline Chame, Chun Chen and Jaewook Shin. Transformation Recipes for Code Generation and Auto-Tuning, The 22nd International Workshop on Languages and Compilers for Parallel Computing, Newark, Delaware, USA. October 8-10, 2009
A. Hartono, M. M. Baskaran, C. Bastoul, A. Cohen, S. Krishnamoorth, B. Norris, J. Ramanujam, and P. Sadayappan. PrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests, Proceedings of the 23rd International Conference on Supercomputing, Jun 8-12, 2009, IBM T. J. Watson Research Center, Yorktown Heights, NY, USA, 2009.
A. Hartono, B. Norris, and P. Sadayappan. Annotation-based empirical performance tuning using Orio, In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, Rome, Italy, 2009. Draft bib
H. Jagode, J. Dongarra, S. Alam, J. Vetter, W. Spear, and A. Malony. A Holistic Approach for Performance Measurement and Analysis for Petascale Applications, ICCS 2009 Joint Workshop: Tools for Program Development and Analysis in Computational Science and Software Engineering for Large-Scale Computing, G. Allen et al. eds. Baton Rouge, Louisiana, Springer-Verlag Berlin Heidelberg 2009, ICCS 2009, Part II, LNCS 5545, pp. 686-695, May 25-27, 2009. PDF
H. Jagode, A. Knuepfer, S. Moore, D. Terpstra, J. Dongarra, M. Jurenz, M. S. Mueller, and W. E. Nagel. Trace-based Performance Analysis for the Petascale Simulation Code FLASH, Innovative Computing Laboratory Technical Report, ICL-UT-09-01, April 15, 2009. PDF
D. Li, B. R. de Supinski, M. Schulz, K. W. Cameron and D. S. Nikolopoulos. Model-Based Hybrid MPI/OpenMP Power-Aware Computing, a poster at SC2009, Portland, Oregon, November 14–20, 2009.
J. Li, X. Ma, K. Singh, M. Schulz, B. R. de Supinski and S. McKee, "Machine Learning Based Online Performance Prediction for Runtime Parallelization and Task Scheduling," 2009 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Boston, Massachusetts, April 26–28, 2009.
Y. Li, J. Dongarra, and S. Tomov. A note on auto-tuning GEMM for GPUs, University of Tennessee Computer Science Technical Report (also LAPACK Working Note 212), UT-CS-09-635, January 12, 2009. PDF
Chunhua Liao, Daniel J. Quinlan, Richard Vuduc and Thomas Panas. Effective Source-to-Source Outlining to Support Whole Program Empirical Optimization, The 22nd International Workshop on Languages and Compilers for Parallel Computing, Newark, Delaware, USA. October 8-10, 2009 Draft
Min Yeol Lim. Improving Power and Performance Efficiency in Parallel and Distributed Computing Systems, Ph.D. dissertation, Department of Computer Science, North Carolina State University, 2009.
C. Lively, S. R. Alam, V. Taylor and J. S. Vetter. A Methodology for Developing High Fidelity Communication Models for Large-scale Applications Targeted on Multicore Systems, 20th International Symposium on Computer Architecture and High Performance Computing, 2008.
Richard Tran Mills, Vamsi Sripathi, G. Mahinthakumar, Glenn E. Hammond, Peter C. Lichtner and Barry F. Smith. Experiences and Challenges Scaling PFLOTRAN, a PETSc-based Code for Subsurface Reactive Flow Simulations, Towards the Petascale on Cray XT Systems, Proceedings of the 51st Cray Users Group Meeting, May 2009, Atlanta, GA, 14 pages
R. Mills, F. Hoffman, P. Worley, K. Perumalla, A. Mirin, G. Hammond, and B. Smith. Coping at the User-Level with Resource Limitations in the Cray Message Passing Toolkit MPI at Scale: How Not to Spend Your Summer Vacation, in Proceedings of the 51st Cray User Group Conference, Atlanta, GA, May 4-7, 2009.
Kathryn Mohror, Karen L. Karavanic, Allan Snavely. Scalable Event Trace Visualization, 2nd Workshop on Productivity and Performance

(PROPER 2009), EuroPar 2009, Delft, The Netherlands, August 2009

M. Noeth, P. Ratn, F. Mueller, M. Schulz and B. R. de Supinski. ScalaTrace: Scalable Compression and Replay of Communication Traces in Massively Parallel Environments, Journal of Parallel and Distributed Computing (JPDC), Vol. 69, No. 8, August 2009, pp. 696-710.
Boyana Norris, Albert Hartono, Elizabeth Jessup and Jeremy Siek. Generating Empirically Optimized Composed Matrix Kernels from MATLAB Prototypes, Proceedings of the International Conference on Computational Science 2009, Baton Rouge, Louisiana, U.S.A., Preprint ANL/MCS-P1581-0209, May 2009. Draft bib
Catherine Mills Olschanowsky, Mustafa Tikir, Laura Carrington, Allan Snavely. Accurate Synthetic Memory Address Streams, DoD UGC, 2009
PAPI team. PAPI 3.7.0 Release Notes, Web
Allan Porterfield, Rob Fowler, Anirban Mandal, and Min Yeol Lim. Empirical Evaluation of Multi-Core Memory Concurrency, UNC/RENCI Technical Report TR-09-01, RENCI, Chapel Hill, North Carolina, January 2009.
B. Rountree, D. K. Lowenthal, B. R. de Supinski, M. Schulz, V. W. Freeh and T. Bletsch, "Adagio: Making DVS Practical for Complex HPC Applications," Twenty Third International Conference on Supercomputing (ICS 2009), Yorktown Heights, New York, June 8 -12, 2009.
Vivek Sarkar, William Harrod, Allan E. Snavely. Software Challenges in Extreme Scale Systems, 2009 Conference on Scientific Discovery through Advanced Computing Program (SciDAC), June 2009.
M. Schulz, A. W. Cook, W. H. Cabot, B. R. de Supinski and W. D. Krauss. On the Performance of the Miranda CFD code on Multicore Architectures, Twenty First International Conference on Parallel Computational Fluid Dynamics (ParallelCFD 2009), Moffett Field, California, May 18 -22, 2009.
Jaewook Shin, Mary Hall, Jacqueline Chame, Chun Chen, Paul Fischer and Paul Hovland. Autotuning and Specialization: Speeding up Matrix Multiply for Small Matrices with Compiler Technology, The Fourth International Workshop on Automatic Performance Tuning (IWAPT 2009), October 1-2 2009, Tokyo Japan.
Fengguang Song, Shirley Moore, and Jack Dongarra. Analytical Modeling for Affinity-based Thread Scheduling on Multicore Platforms, 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2009), Feb 14-18, 2009, Raleigh, North Carolina.
F. Song, S. Moore, and J. Dongarra. Analytical Modeling and Optimization for Affinity Based Thread Scheduling on Multicore Systems, IEEE Cluster 2009, New Orleans, Aug. 31 - Sept. 4, 2009. PDF
F. Song, A. YarKhan, and J. Dongarra. Dynamic Task Scheduling for Linear Algebra Algorithms on Distributed-Memory Multicore Systems, University of Tennessee Computer Science Technical Report, UT-CS-09-638, April 13, 2009. PDF
Vamsi Sripathi, Glenn E. Hammond, G. Mahinthakumar, Richard T. Mills, Patrick H. Worley and Peter C. Lichtner. Performance Analysis and Optimization of Parallel I/O in a large scale groundwater application on the Cray XT5, poster presentation, Supercomputing 2009, Portland, Oregon, Nov 12-16.
Nathan R. Tallent, John M. Mellor-Crummey, Laksono Adhianto, Michael W. Fagan, and Mark Krentel. Diagnosing Performance Bottlenecks in Emerging Petascale Applications, SC09, November, 2009
Nathan R. Tallent, John M. Mellor-Crummey. Effective Performance Analysis of Work Stealing, IEEE Computer, March 2009 (submitted)
Nathan Tallent, John Mellor-Crummey, and Michael Fagan. Binary analysis for measurement and attribution of program performance, In Proceedings of the ACM SIGPLAN Symposium on Program Language Design and Implementation (PLDI), Dublin, Ireland, June 2009. Distinguished paper award Draft (PDF)
Nathan Tallent and John Mellor-Crummey. Effective performance measurement and analysis of multithreaded applications, In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Raleigh, North Carolina, USA, February 2009. DOI Draft (PDF)
Mustafa M. Tikir, Michael Laurenzano, Laura Carrington, and Allan Snavely. PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications, EuroPar 2009, Delft, the Netherlands, August 2009 (Accepted for publication)
Mustafa M. Tikir, Michael A. Laurenzano, Laura Carrington, Allan Snavely. PSINS: An Open Source Event Tracer and Execution Simulator, DoD UGC, 2009
Ananta Tiwari, Chun Chen, Jacqueline Chame, Mary Hall and Jeff Hollingsworth. Scalable Autotuning Framework for Compiler Optimization, Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’09), Rome, Italy, May 2009
A. Tiwari, V. Tabatabaee, J. K. Hollingsworth. Tuning Parallel Applications in Parallel, Parallel Computing, 35 (2009), pp. 475-492.
P. Worley, R. Barrett, and J. Kuehn. Early Evaluation of the Cray XT5, Proceedings of the 51st Cray User Group Conference, Atlanta, GA, May 4-7, 2009
Nicholas J. Wright, Wayne Pfeiffer and Allan Snavely. Characterizing Parallel Scaling of Scientific Applications using IPM, The 10th LCI International Conference on High-Performance Clustered Computing, Denver, CO, USA, March 10-12 2009

2008

Laksono Adhianto, Sinchan Banerjee, Michael Fagan, Mark Krentel, Gabriel Marin, John Mellor-Crummey and Nathan Tallent. HPCToolkit: Tools for performance analysis of optimized parallel programs, Concurrency and Computation: Practice and Experience, August 2008. To appear.
L. Adhianto, M. Fagan, M. Krentel, G. Marin, J. Mellor-Crummey, and N. R. Tallent. HPCToolkit: Performance Measurement and Analysis for Supercomputers with Node-Level Parallelism, Workshop on Node Level Parallelism for Large Scale Supercomputers, held in conjunction with SC08, 2008
Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Gregory L. Lee, Barton P. Miller, and Martin Schulz. Overcoming Scalability Challenges for Tool Daemon Launching, 2008 International Conference on Parallel Processing (ICPP-08), Portland, OR, September 8-12, 2008
S. R. Alam, N. Bhatia, and J. S. Vetter. An Exploration of Performance Attributes for Symbolic Modeling of Emerging Processing Devices, 3rd International High PerformanceComputation Conference (HPCC), 2007
S. R. Alam, N. Bhatia, and J. S. Vetter. Sensitivity Analysis of Biomolecular Simulations using Symbolic Models, 7th International Conference on BioInformatics andBioEngineering, Boston, 2007
S. R. Alam, J. S. Meredith and J. S. Vetter. Balancing Productivity and Performance on the Cell Broadband Engine, IEEE Annual International Conference on Cluster Computing, 2007
S. R. Alam, P. Agarwal, H. Ong, S. Hampton and J. S. Vetter. Impact of multicores on large-scale molecular dynamics simulations, IEEE International Workshop on High Performance Computational Biology (HiCOMB), in conjunction with IPDPS, 2008, pp. 1-7.
S. R. Alam, R. F. Barrett, M. R. Fahey et al. An Evaluation of the Oak Ridge National Laboratory Cray XT3, International Journal of High Performance Computing Applications, vol. 22, no.1, pg. 52-80, 2008
S. R. Alam, R. F. Barrett, M. Eisenbach, M. R. Fahey, R. Hartman-Baker, J. A. Kuehn, S. W. Poole, R. Sankaran and P. H. Worley. The Cray XT4 Quad-core : A First Look, 50th Cray User Group Conference, Helsinki, Finland, May 5-8, 2008
S. R. Alam, R. F. Barrett, M. Bast, M. R. Fahey, J. Kuehn, C. McCurdy, J. Rogers, P. C. Roth, R. Sankaran, J. S. Vetter, P. H. Worley and W. Yu. Early Evaluation of IBM BlueGene/P, SC08, Austin, TX, November 15-21, 2008
David Bailey, Jacqueline Chame, Chun Chen, Jack Dongarra, Mary Hall, Jeffrey K. Hollingsworth, Paul Hovland, Shirley Moore, Keith Seymour, Jaewook Shin, Ananta Tiwari, Sam Williams, Haihang You. PERI Auto-Tuning, Journal of Physics: Conference Series 125 (2008), Nov. 2008 Draft
Bradley Barnes, Barry Rountree, David K. Lowenthal, Jaxk Reeves, Bronis R. de Supinski and Martin Schulz. A Regression-Based Approach to Scalability Prediction, 21nd International Conference on Supercomputing (ICS 2008), Kos, Greece, June 7-12, 2008.
R. F. Barrett, M. R. Fahey, et al. An Evaluation of the Oak Ridge National Laboratory Cray XT3, International Journal of High Performance Computing Applications, vol. 22, no. 1, February 2008, pg. 52-80
Victor R. Basili, Jeff Carver, Daniela Cruzes, Lorin Hochstein, Jeffrey K. Hollingsworth, Forrest Shull, Marvin V. Zelkowitz. Understanding The High Performance Computing Community: A Software Engineer's Perspective, IEEE Software, Jul 2008
M. Bast, J. Kuehn, C. McCurdy, J. Rogers, P. C. Roth and W. Yu. Early Evaluation of IBM BlueGene/P, SC2008, Austin, TX, USA, November 2008
V. Bui, B. Norris, K. Huck, L. C. McInnes, L. Li, O. Hernandez, and B. Chapman. A Component Infrastructure for Performance and Power Modeling of Parallel Scientific Applications, Proceedings of Component-Based High Performance Computing Workshop, Oct14-17, 2008, Karlsruhe, Germany. Preprint ANL/MCS-P1538-0908, 2008. Draft bib
Laura Carrington, Dimitri Komatitsch, Michael Laurenzano, Mustafa Tikir, David Michéab, Nicolas Le Goff, Allan Snavely and Jeroen Tromp. High-Frequency Simulations of Global Seismic Wave Propagation Using SPECFEM3D_GLOBE on 62K Processors, SC08, Nov 2008. Draft
T. Chen, O. Khalili, R. L. Campbell Jr., L. Carrington, M. Tikir and A. Snavely. Performance Prediction and Ranking of Supercomputers, 2008 Advances in Computers,, Vol. 72, January 2008. Article
Matthew Curtis-Maury, Karan Singh, Sally A. McKee, Filip Blagojevic,Dimitrios S. Nikolopoulos, Bronis R. de Supinski and Martin Schulz. Identifying Energy-Efficient Concurrency Levels Using Machine Learning, International Workshop on Green Computing (GreenCom'07), Austin, TX, Sep 17, 2007
Matthew Curtis-Maury, Ankur Shah, Filip Blagojevic, Dimitrios S. Nikolopoulos, Bronis R. de Supinski and Martin Schulz. Prediction Models for Multidimensional Power-Performance Optimizations on Many Cores, 17th International Conferenceon Parallel Architectures and Compilation Techniques (PACT-2008), Toronto, Canada, October 25–29, 2008.
Bronis R. de Supinski, Jeff Hollingsworth, Shirley Moore and Patrick Worley. Results of the PERI Survey of SciDAC Applications, SciDAC 2007, Boston, MA, June 24–28, 2007
Bronis R. de Supinski, Robert J. Fowler, Todd Gamblin, Frank Mueller, Prasun Ratn and Martin Schulz. An Open Infrastructure for Scalable, Reconfigurable Analysis, International Workshop on Scalable Tools for High-End Computing (STHEC), Kos, Greece, June 7, 2008
J. Dongarra, R. Graybill, W. Harrod et al. DARPA's HPCS Program: History,Models, Tools, Languages, Advances in Computers, vol. 72, M. V. Zelkowitz, ed. London:Academic Press, Elsevier, 2008
Wu Feng, Robert J. Fowler, Mark K. Gardner, Song Huang, Allan Porterfield. Multi-Source Event Generation And Analysis For Performance Understanding In Large-Scale Environments, Poster at ORNL Fall Creek Falls Conference, September 2008
Robert J. Fowler, Todd Gamblin, Gopi Kandaswamy, Anirban Mandal, Allan K. Porterfield, Lavanya Ramakrishnan and Daniel A. Reed. Challenges of Scale: When All Computing Becomes Grid Computing, in Lucio Grandinetti, editor, High PerformanceComputing and Grids in Action, Advances in Parallel Computing, IOS Press, Amsterdam, Mar2008. SciDAC-2 PERI 26 FY08 annual report.
Robert J. Fowler, Todd Gamblin, Allan K. Porterfield, Patrick Dreher, Song Huang and Balint Joo. Performance Engineering Challenges: The View from RENCI, Journal of Physics: Conference Series, pg. 5, 2008, to appear.
Robert J. Fowler, Lavanya Ramakrishnan and Steven R. Thorpe. Stateful Grid Resource Selection for Related Asynchronous Tasks, Technical Report, RENCI, Chapel Hill, NC, April 2008. Document
K. Fuerlinger and S. Moore. Detection and Analysis of Iterative Behavior inParallel Applications, 2008 International Conference on Computational Science (ICCS 2008),Krakow, Poland, Lecture Notes in Computer Science 5103, pg. 261-267, June 2008.
Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Robert J. Fowler and Daniel A. Reed. Scalable Load Balance Measurement for SPMD Codes, SC2008, Austin, Texas, November 15–21, 2008.
Todd Gamblin, Rob Fowler, and Daniel A. Reed. Scalable Methods for Monitoring and Detecting Behavioral Classes in Scientific Codes, in Proceedings of the International Parallel and Distributed Processing Symposium 2008, Miami, FL, April 2008.
Jelena Pjesivac-Grbovi´c, Thara Angskun, George Bosilca, Graham E. Fagg, Edgar Gabriel, and Jack J. Dongarra. Performance Analysis of MPI Collective Operations, Cluster Computing Journal, vol. 10 (2007), pg. 127-143.
E. Grobelny, D. Bueno, I. Troxel, A. D. George, and J. S. Vetter. FASE: A Framework for Scalable Performance Prediction of HPC Systems and Applications, Simulation,vol. 83, no. 10, pg. 721-45, 2007.
J. Hein, H. Jagode, U. Sigrist, A. Simpson and A. Trew. Parallel 3D-FFTs for Multi-Core Nodes on a Mesh Communication Network, Proceedings of Cray User Group Conference (CUG 2008), Helsinki, Finland, May 2008.
K. A. Huck, O. Hernandez, V. Bui, S. Chandrasekaran, B. Chapman, A. D. Malony, L. C. McInnes and B. Norris. Capturing Performance Knowledge for Automated Analysis, SC08 ,2008. Draft
Kevin A. Huck, Wyatt Spear, Allen D. Malony, Sameer Shende and Alan Morris. Parametric Studies in Eclipse with TAU and PerfExplorer, Workshop on Productivity and Performance Tools for HPC Application Development, August 2008
Kevin A. Huck, Allen D. Malony, Sameer Shende and Alan Morris. Knowledge Support and Automation for Performance Analysis with PerfExplorer 2.0, Journal of Scientific Programming (special issue on Large-Scale Programming Tools and Environments), vol. 16, no. 2-3, Jan 2008, pg. 123-134
Engin Ipek, Sally A. McKee, Karan Singh, Rich Caruana, Bronis R. de Supinski and Martin Schulz. Efficient Architectural Design Space Exploration via Predicitive Modeling, ACM Transactions on Architecture and Code Optimization, Vol. 4, No. 4, Jan 2008, pg. 1-33
H. Jagode and J. Hein. Custom Assignment of MPI Ranks for Parallel Multidimensional FFTs: Evaluation of BG/P versus BG/L, 2008 IEEE International Symposium onParallel and Distributed Processing with Applications (ISPA-08), Sydney, Australia, Springer,InderScience, Dec 10-12, 2008
H. Jagode, S. Alam, C. Lively, J. Vetter and J. Dongarra. Modeling Assertions for Petascale Applications and Systems, SC08, Austin, TX, November 2008
Gregory L. Lee, Dong H. Ahn, Dorian C. Arnold, Bronis R. de Supinski, Matthew Legendre, Barton P. Miller, Martin Schulz and Ben Liblit. Lessons Learned at 208K: Towards Debugging Millions of Cores, SC08, Austin, Texas, November 2008
J. Li, B. Norris, H. Johansson, L. McInnes and J. Ray. Component Infrastructure for Managing Performance Data and Runtime Adaptation of Parallel Applications, 9th International Workshop on State-of-the-Art in Scientific and Parallel Computing(PARA08), May 13-16, 2008,Trondheim, Norway.SciDAC-2 PERI 27 FY08 annual report
C. Lively, S. R. Alam, V. Taylor and J. S. Vetter. A Methodology for Developing High Fidelity Communication Models for Large-scale Applications Targeted on Multicore Systems, 20th International Symposium on Computer Architecture and High Performance Computing, 2008
Bob Lucas. Performance Engineering Research Institute 2008 Annual Report, September 2008. Document
Allen D. Malony, Sameer Shende, Alan Morris, S. Biersdorff, Wyatt Spear, Kevin A. Huck and Aroon Nataraj. Evolution of a Parallel Performance System, Second International Workshop on Tools for High Performance Computing, July 2008
Gabriel Marin, Guohua Jin, and John Mellor-Crummey. Managing Locality in Grand Challenge Applications: A Case Study of the Gyrokinetic Toroidal Code, SciDAC 2008,Journal of Physics: Conference Series 125 012087
G. Marin and J. Mellor-Crummey. Pinpointing and Exploiting Opportunities for Enhancing Data Reuse, International Symposium on Performance Analysis of Systems and Software, April 2008
C. McCurdy, A. Cox, and J. S. Vetter. Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Austin, TX, 2008.
John Mellor-Crummey and Nathan Tallent. A Methodology for Accurate, Effective and Scalable Performance Analysis of Application Programs, Workshop on Tools, Infrastructures and Methodologies for the Evaluation of Research Systems, in conjunction with the 2008 IEEE International Symposium on Performance Analysis of Systems and Software, pg. 4-11, Feb 2008
J. Michalakes, J. Hacker, R. Loft, M. O. McCracken, A. Snavely, N. J. Wright, T. Spelce, B. Gorda and R. Walkup. WRF Nature Run, 2008 Journal of Physics: Conference Series, vol. 125 (2008), 012022.
Alan Morris, Wyatt Spear, Allen D. Malony and Sameer Shende. Observing Performance Dynamics using Parallel Profile Snapshots, European Conference on Parallel Processing (EuroPar 2008), August 2008
PAPI team. PAPI 3.6.1 Release Notes, http://icl.cs.utk.edu/papi/
Allan Porterfield, Robert Fowler and Mark Neyer. MAESTRO: Dynamic Runtime Power Control, in Workshop on Managed Multicore systems (MMCS), Boston, MA, Jun 2008.
Allan Porterfield, Robert J. Fowler, Anirban Mandal, and Min Yeol Lim. Performance Consistency on Multi-Socket AMD Opteron Systems, UNC/RENCI Technical Report TR-08-07, RENCI, North Carolina, December 2008
Robert Preissl, Thomas Koeckerbauer, Martin Schulz, Dieter Kranzlmueller, Bronis R. de Supinski and Daniel J. Quinlan. Detecting Patterns in MPI Communication Traces, 2008 International Conference on Parallel Processing (ICPP-08), Portland, OR, September 8-12,2008.
Robert Preissl, Martin Schulz, Dieter Kranzlmueller, Bronis R. de Supinski and Daniel J. Quinlan. Using MPI Communication Patterns to Guide Source Code Transformations, Tools for Program Development and Analysis in Computational Science, Krakow, Poland, June 23-25,2008.
Prasun Ratn, Frank Mueller, Bronis R. de Supinski and Martin Schulz. Preserving Time in Large-Scale Communication Traces, Twenty Second International Conference on Supercomputing (ICS 2008), Kos, Greece, June 7-12, 2008
Keith Seymour, Haihang You, and Jack Dongarra. Search Techniques for Empirical Code Optimization, 3rd International Workshop on Automatic Performance Tuning,Oct 1, 2008, Tsukuba International Congress Center, EPOCHAL TSUKUBA, Japan.
K. Seymour, H. You and J. Dongarra. A Comparison of Search Heuristics for Empirical Code Optimization, 3rd international Workshop on Automatic Performance Tuning,Tsukuba, Japan, October 1, 2008.
Allan Snavely, Laura Carrington, Bronis de Supinski, Jeffrey Vetter. Performance Modeling: Impact of Architecture Trends on Applications, ASCR Computer Science Research Principal Investigators Meeting (poster), Denver, Mar 31 - Apr 2, 2008.
F. Song, S. Moore, J. Dongarra. Analytical Modeling for Affinity-Based Thread Scheduling on Multicore Platforms, UT-CS-08-626, August 12, 2008.
Nathan Tallent, John Mellor-Crummey, Laksono Adhianto, Michael Fagan, and Mark Krentel. HPCToolkit: Performance Tools for Scientific Computing, SciDAC 2008, Journal of Physics Conference Series 125 012088.
Nathan R. Tallent. Performance Analysis of Optimized Code: Binary Analysis for Performance Insight, Master's Thesis, VDM Verlag, Dr. Müller, 2008.
Jeffery L. Tilson, Mark S. C. Reed and Robert J. Fowler. Workflows for Performance Evaluation and Tuning, in 2008 IEEE International Conference on Cluster Computing (Cluster2008), pg. 8, Tsukuba, Japan, Sep 2008, IEEE.SciDAC-2 PERI 29 FY08 annual report
Lin-Wang Wang, Byounghak Lee, Hongzhang Shan, Zhengji Zhao, Juan Meza, Erich Strohmaier, David Bailey. Linearly Scaling 3D Fragment Method for Large-Scale Electronic Structure Calculations, SC08, Nov 2008. Draft
S. Williams, J. Carter, L. Oliker, J. Shalf and K. Yelick. Lattice Boltzmann Simulation Optimization on Leading Multicore Platforms, International Parallel & Distributed Processing Symposium (IPDPS), 2008. Best paper, applications track
Samuel Williams, Kaushik Datta, Jonathan Carter, Leonid Oliker, John Shalf, Kathy Yelick and David H. Bailey. PERI -- Auto-Tuning Memory-Intensive Kernels for Multicore, Journal of Physics: Conference Series, vol. 125 (2008), Nov. 2008 PDF.
F. Wolf, B. Wylie, E. Abraham, D. Becker, W. Frings, K. Fuerlinger, M. Geimer, M. Hermanns, B. Mohr, S. Moore, M. Pfeifer, Z. Szebenyi. Usage of the Scalasca Toolset for Scalable Performance Analysis of Large-scale Parallel Applications, 2nd International Workshop on Tools for High Performance Computing, Resch, Keller, Himmler, Krammer, Schulz, eds. Stuttgart, Germany, Springer, pg. 157-167, July 2008
Felix Wolf, Bernd Mohr, Jack Dongarra, Shirley Moore. Automatic Analysis of Inefficiency Patterns in Parallel Applications, Concurrency and Computation: Practice andExperience, vol. 19, no. 11, pg. 1481-1496, Aug 2007.
P. H. Worley. Early Evaluation of the IBM BG/P, LCI International Conference on High Performance Clustered Computing, Urbana, IL, Apr 29-May 1, 2008. Best paper award
W. Yu, J. S. Vetter and H. S. Oral. Performance Characterization and Optimization of Parallel I/O on the Cray XT, 22nd IEEE International Parallel and Distributed Processing Symposium (IPDPS 2008), 2008

2007 and earlier

Publications from 2007 and earlier, including PERC-2 publications, are on this page.

Personal tools