Perfdb Run Rules - PERI

Perfdb Run Rules

From PERI

Jump to: navigation, search

This page is a re-formatting of the Word document entitled: PERI Performance Data Collection Run Rules

Performance data repository main page.


PERI Performance Data Collection Run Rules

Draft 12/18/2006

Updated 12/27/2006

The purpose of this document is to set guidelines for what metadata should be collected to enable the results of a performance experiment to be interpreted correctly and be repeatable. Because a number of factors affect the performance of an application, the metadata that need to be captured encompass a wide range of information about various entities, including the application code, the transformation and tuning applied to the code (e.g., source-to-source preprocessing, automated tuning), compiler environment, library codes used, input sets, the performance instrumentation, and the runtime environment. Because various performance database systems record this information in different ways, an exact specification down to the level of attribute names is not possible. Instead, we define the categories of information that should be captured and what should be included in each category, and we leave the specific details about how attributes are named and stored to the implementation.

The following categories of information are needed:

  1. Source code information. The exact version of the source code of all components making up the application should be unambiguously recorded. Ways of doing this include storing the source code itself, storing a secure hash of the source code, and tying the version to a version control system.
  2. Transformations. Any and all automated transformations that are applied to the source code should be specified in a repeatable fashion. Such transformations include source-to-source preprocessing and replacement of portions of the source code with automatically generated code.
  3. Compiler environment. The exact version of compiler and the compiler options used should be recorded. Any configuration information for the compiler environment should also be included.
  4. Libraries used. The exact versions all libraries linked with the application code should be recorded, including dynamically linked shared libraries.
  5. Input sets.The exact versions of all input files used should be captured. Options include storing the input sets themselves, storing a secure hash of each input data file, or tying the versions used to a version control system.
  6. Performance instrumentation. Because instrumentation to collect performance data affects the performance of the code, and because different versions of instrumentation libraries can affect the data collected as well as the overhead of doing so, the exact steps carried out to instrument the code should be recorded. Note that in the case of source code instrumentation, this information may be included in category 2 (Transformations) above. Information should include the tool and tool version used to collect the data and the configuration of the tool used. Some information about performance instrumentation may be included in category 6 (Runtime environment) below.
  7. Runtime environment. The following information should be recorded:
    1. date and time (to the second) when the job started
    2. machine architecture
    3. OS/kernel version
    4. scheduler software versions
    5. network interface firmware versions
    6. version information about the parallel file system
    7. contents of the scheduler queue when the job was launched
    8. nodes on which the job was run
    9. environment variable settings
    10. job completion and output status and any error messages