Bernd Mohr: To Infinity and Beyond?! On Scaling Performance Measurement and Analysis Tools for Parallel Programming.

Tuesday, October 2nd, 14:00

Abstract. The number of processor cores available in high-performance computing systems is steadily increasing. A major factor is the current trend to use multi-core and many-core processor chip architectures. In the latest list of the TOP500 Supercomputer Sites, 63% of the systems listed have more than 1024 processor cores and the average is about 2400.

While this promises ever more compute power and memory capacity to tackle today's complex simulation problems, it forces application developers to greatly enhance the scalability of their codes to be able to exploit it. This often requires new algorithms, methods or parallelization schemes to be developed as many well-known and accepted techniques stop working at these large scales. It starts with simple things like opening a file per process to save checkpoint information, or collecting simulation results of the whole program via a gather operation on a single process, or previously unimportant order O(n2)-type operations which quickly dominate the execution. Unfortunately many of these performance problems only show up when executing with very high numbers of processes and cannot be easily diagnosed or predicted from measurements at lower numbers. Detecting and diagnosing these performance and scalability bottlenecks requires sophisticated performance instrumentation, measurement and analysis tools. Simple tools typically scale very well but the information they provide proves to be less and less useful at these high scales.

It is clear that tool developers face exactly the same problems as application developers when enhancing their tools to handle and support highly scalable applications. In this talk we discuss the major limitations of currently used state-of-the-art performance measurement, analysis and visualisation methods and tools. We give an overview about experiments, new approaches and first results of performance tool projects which try to overcome these limits. This includes new scalable and enhanced result visualization methods used in the performance analysis framework TAU [1], methods to automatically extract key execution phases from long traces used by the Paraver toolset [2], more scalable client/server tool architecture like the one of VampirServer [3] for scalable timeline visualisations, and highly-parallel automatic performance bottleneck searches utilized by the Scalasca toolset [4] .

[1] S. Shende and A. D. Malony. TAU: The TAU Paral lel Performance System. International Journal of High Performance Computing Applications, Volume 20, Number 2, pp. 287–331, Summer 2006.
[2] J. Labarta, J. Gimenez, E. Martinez, P. Gonzales, H. Servat, G. Llort, and X. Aguilar. Scalability of visualization and tracing tools. In Proceedings Parallel Computing (ParCo) 2005, Malaga, Spain, September 2005.
[3] A. Knupfer, H. Brunst, and W. E. Nagel. High Performance Trace Visualization. In Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-based Processing, Lugano, Switzerland, February 2005.
[4] M. Geimer, F. Wolf, B. J. N. Wylie, and B. Mohr. Scalable Paral lel Trace-Based Performance Analysis. In Proceedings of the 13th European Parallel Virtual Machine and Message Passing Interface Conference, Springer LNCS 4192, pp. 303–312, Bonn, Germany, September 2006.

About the author. Bernd Mohr started to design and develop tools for performance analysis of parallel programs already with his diploma thesis back in 1987 at the University of Erlangen in Germany, and continued this in his Ph.D. work. During a three year PostDoc position at the University of Oregon, he was responsible for the design and implementation of the original TAU performance analysis framework for the parallel programming language pC++. Since 1996 he is a senior scientist at the Research Center Jülich. Besides being responsible for user support and training in regard to performance tools, he is leading the KOJAK research group on automatic performance analysis of parallel programs which is a joint project with the Innovative Computing Laboratory at the University of Tennessee. He was a founding member and work package leader of the European Community IST working group on automatic performance analysis: APART. He is the author of several dozen conference and journal articles about performance analysis and tuning of parallel programs.
EuroPVM/MPI 2007 Logo
Content managed by the Etomite Content Management System.