Perfdb Database Schemas

From PERI

Jump to: navigation, search

This page shows the main features of the schemata of some of the most widely used parallel performance databases. The goal is to understand how they can interoperate, either by translation through a a common format or by pairwise translation rules. If this is successful, it will have several happy results, not the least of which is the decoupling of the analysis tools from the tool used to collect the data.

There is not yet a defined format to present this information. Participants are encouraged to include any and all information deemed useful.

Performance data repository main page.


Contents

PerfDMF

The context information in our database is organized into an application, experiment, trial hierarchy. Trials belong to one experiment, and experiments belong to one application. The application, experiment and trial tables can be used to save whatever context data the user of PerfDMF wants. Our tools do not require these three tables to have any particular columns, except for a descriptive name. The included default schema contains some examples of context data columns.

PerfDMF schema

The data we automatically collect is limited to performance information and the following fields: full call path, phase, source file and line number for each instrumented event. These are all encoded in the event name. Currently, any other context data associated with the profile data has to be updated in the database manually. In the near future, we will change the TAU measurement library to collect as much of the machine information as we can (hostname, system info, etc.).

That said, the data we would LIKE to collect and use would include anything that would give clues about application behavior, and any parameters that an analyst would use in a parametric study. A non-exhaustive list and some example values would include:

  • Application information: name, version, paradigm (e.g. openmp, mpi/pvm, shmem, pgas, hybrid, other), model (e.g. master/worker, divide & conquer, pipeline, other)
  • Compiler information: compiler name, compiler version, flags, options, libraries
  • Execution information: os type, os version, machine name, process count, node names and associated processes, algorithm used (e.g. fft or dft), processor type, processor attributes, input data, data decomposition method (e.g. 2D or 3D, pencils or slabs), mpi implementation (e.g. OpenMPI, mpich, vendor-supplied), communication topology, environment variables, command line arguments

--Khuck 10:23, 12 December 2006 (PST)


RENCI HPC Database / SvPablo

HPC Database Document

Introduction

HPC Database is a web-based infrastructure originally designed for the SciDAC QCD project. The goal is to create a knowledge base for maintaining and sharing the performance analysis results for the QCD community.

The database stores the performance data collected by RENCI performance team for QCD applications on various high performance computing systems. It also provides web interfaces for users to browse the performance data, perform statistical analysis and conduct performance comparisons.

HPC Database Tools

Command line importer: a tool to upload the performance files via command line. The files are currently restricted to SvPablo performance file format.

Syntax: java -jar dbimport-0.1.jar TYPE simple_metadata data_file · TYPE: file format. Currently only SDDFXML format is supported. It accepts the files that are generated by SDDF->XML converter. The code is designed to expand to other formats. · simple_metadata: a file to list properties that provide new entries for performance tool, application, platform, and platform type. If the line of “Use.Existing” is uncommented, the importer will use the existing properties in the database. · data_file: the performance file in XML format.

Example: java -jar dbimport-0.1.jar SDDFXML svpablo_doris_metadata.properties np128_24_24_24_24_papi.performance.xml

Metadata XML Schema: a scheme to describe the metadata in the database in XML format.

Web-based upload tool: a tool to upload the performance files and associated metadata via web interface. It is currently at http://dante1:9888/metagui-0.1/pages/Welcome.jsp

Web-based data browsing and queries:

The web site for the online query is currently located at: http://dante1.renci.org:9888/hpcdbgui-0.1/pages/

There are three types of data queries currently available · Performance Data Browsing by Application · Scalability Analysis · Simple Ratio Calculations for Selected Performance Metrics It also provides the option for users who are familiar with SQL language to submit online SQL queries.

Database Structure

The database currently has 34 tables including 6 tables for performance data and 28 tables for related metadata and utility. The database table diagram is shown in Appendix. The tables for both categories are listed below:

Performance data tables:
·	For hardware performance counter data for instrumented events:
o	HW_PERF_DATA
o	SUMMARY_HW_PERF_DATA 
·	For timing and general information about instrumented events (Procedures, Calls and Loops)
o	PROC_DATA
o	PROC_CALL_SEQ
o	INSTR_EVENT_PERF_DATA
o	SUMMARY_INSTR_EVENT_PERF_DATA 

Metadata tables:
·	Persons associated with the runs (who did the runs)
o	PERSON_DATA
o	RUN_CONTRIB
o	APP_CONTRIB 
·	Physical Location of nodes (where the actually performance files are located)
o	LOCATION_DATA 
·	Nodes description (Nodes where the run executed)
o	NODE_DATA
o	NODE_TYPE_DATA
o	NODE_CFG_DATA
o	CPU_TYPE_DATA
o	OS_DATA 
·	Platforms description (Platform information associated with the nodes)
o	PLATFORM_DATA
o	PLATFORM_TYPE_DATA 
·	Application (application description, including version, name, etc.)
o	APP_DATA
o	APP_CONFIG_RUN
o	APP_CONFIG_DATA
o	APP_FILES 
·	Performance tool used to collect data (which tools are used in the specific run)
o	PTOOL_DATA
o	PTOOL_CONFIG_DATA
o	PTOOL_CONFIG_RUN
o	PTOOL_FILES 
·	Miscellaneous (run time configuration, data collection method, data input, etc.)
o	RUN_DATA, RUN_CONFIG_DATA
o	RUN_ADDED_DATA
o	METRICS
o	ANNOTATION
o	RUN_COLL_METHOD
o	COLL_METHOD_DATA
o	COLUMN_UNITS
o	ID 
HPC Database Diagram
HPC-DB

Prophesy

The following text is a reformatting of this PDF: Pdf_icon.png Prophesy DB document Info_circle.png

For detailed information, please see Prophesy website: http://prophesy.cs.tamu.edu

Introduction

Prophesy is a web-based infrastructure for the performance analysis and modeling of parallel and distributed applications. Prophesy includes a database for the archival of performance data, system features and application details, to aid in analysis and modeling of application. Prophesy consists of the following components:

Details of Prophesy Database

Prophesy Database and Data Collection

The Prophesy Database was designed to accommodate queries that lead to the development of performance models, allow for prediction of performance on other systems, and allow for one to obtain insight into methods to improve the performance of the application on a given distributed system. Hence, the database facilitates the following query types:

  • Identify the best implementation of a given function for a given system configuration (identified by the run-time system, operating system, processor organization, etc.). This can be implemented querying the database to do comparison.
  • Use the raw performance data to generate analytical (nonlinear or linear) models of a given function or application; the analytical model can be used to extrapolate the performance under different system scenarios and can be used to assist programmers in optimizing the strategy or algorithms in their programs.
  • Use the performance data to analyze application-system trends, such as scalability, speedup, I/O requirements, communication requirements, etc. This can be implemented querying the database to calculate the corresponding formula.
  • Use the performance data to analyze user specific metrics such as coupling between functions.

Prophesy assumes that applications can be decomposed into modules, which can be further decomposed into functions that can be decomposed into basic units in a hierarchical manner as depicted in Figure 1 (see PDF).

Each component in the above structure has the following meaning:

  • Application: refers to the complete large-scale application.
  • Modules: refer to the various files that comprise the application; it is assumed that the application designer uses some modularity in the application design.
  • Functions: refer to the different function routines that may be contained in a given module. Users will be asked to associate a "pure function" name with their given function where appropriate. For example, a user may identify their function "genfft" as the pure function FFT. Pure functions are widely used functions such as conjugate gradient or gaussian elimination.
  • Basic Units: refer to a code segment that may be of smaller granularity than a function but higher granularity than a basic block. For example, a segment of nested loops would be considered one basic unit.

Prophesy Database

The Prophesy database has a hierarchical organization, consistent with the hierarchical structure of the applications. The schema shown in Figure 2 includes the performance database, system models database and template database. The entities in the database are organized into four areas: application information, executable information, run information and performance statistics. Descriptions of these four areas are given below.

  • Application Information: includes one entity that gives the application name, version number, a short description, owner information and password (such that only the owner can modify or add data for a given application). It is assumed that an application goes through various versions as one adds different functionalities over time. Data are placed into this entity when a new application is being developed.
  • Executable Information: includes all of the entities related to generating an executable of an application. These entities include details about compilers, libraries, Module, Function,and the control flow. It is assumed that applications may be developed using multiple languages. The executable entities include details about the executable, modules, functions, model_templates (used for model generation), compiler, libraries and control_flow. Data are placed into these entities when a new executable is generated.
  • Run Information: includes all of the entities related to running an executable, which includes the system information and inputs used for execution. This system may be a single processor, single parallel machine or distributed system. These entities include the inputs, system(s) used for execution, time and date of execution. Data are placed into these entities for each run of a given executable..
  • Performance Statistics Information: includes all of the entities related to the raw performance data collected during execution. These entities include application_performance, function_performance, and basic_unit_performance. For each level of performance we store the following:
    • Number of times the piece of code was executed
    • Average run time of the piece of code
    • Standard deviation of the average run time

(see PDF) Figure 2. Framework of Prophesy Database Schema

The detailed Prophesy database schema is as follows: Image:pdbschema.gif


Data Collection

The PAIDE (Prophesy Automatic Instrumentation and Data Entry) shown in Figure 3 is the data collection component of the Prophesy system, with the goal of minimizing instrumentation overhead and code. It focuses on the automatic instrumentation of codes at the level of basic blocks, procedures, or functions. The default mode consists of instrumenting the entire code at the level of basic loops and procedures. A user can specify that the code be instrumented at a finer granularity than that of loops or identify the particular events to be instrumented. The resultant performance data is automatically placed in the performance database at the end of the program execution, and is used by the data analysis component to produce an analytical performance model with coefficients, at the granularity specified by the user. There are two ways to input the performance data into the database: interactive data entry and automatic data entry. Automatic data entry entails sending the performance data as a SOAP message to the Prophesy database and using Perl scripts to automatically process the performance performance data files. The interactive data entry is to use interactive form interfaces to manually put the data into the Prophesy database via the websites.

Image:PAIDE.jpg

Figure 3. Prophesy automatic instrumentation and data entry framework

Run Rules

To run an application using PAIDE for instrumentation entails the following steps:

  1. Register the application and executable with the Prophesy database. Currently, this requires manual data entry that is done only once per executable.
  2. Run PAIDE on the full application to instrument the application code and get call graph and performance relations (the module name and line of code where instrumentation has been inserted).
    1. The call graph and performance relations are packaged as a SOAP message to the Prophesy database.
  3. Compile the application as usual.
  4. Run the executable any number of times
    1. For each run of the executable, the performance data for each processor, along with the date stamp and IP address of node 0, are sent to the Prophesy database as a SOAP message.

PerfTrack

The basic unit of stored data is a performance result, a scalar floating-point value. Associated with each performance result is:

  • A metric string, describing what is being measued (such as floating point operations or execution time)
  • A units string, such as "seconds"
  • One or more contexts, which describe everything known about the circumstances of the measurement. A typical measurement will have only one context, but there are situations were multiple contexts make sense.

A context is a set of resources, and a resource is an item of information about a measurement. For example, in a measurement of the exeuction time of a particular function on one process of a parallel job, possible resouces are the function name, the identifier of the node where the function executed, the type of processor, the compiler used to build the application, the list of libraries bound to the application, the command-line flags given to the executable, and so on. In short, anything that is controlled or observed when a measurement is made can be a resource. It is up to the user to decide which resources to store with a performance measurement. A resource type is just what the name suggests: a category of resources. For example "gcc" could be a resource of type "compiler." PerfTrack has no required resource types, although it's common to include at least the application name and some identification of the execution. Resources and resource types are stored as strings.

Resources can be hierarchical, so one resource can imply many subresources. For example, a parallel machine is a resource that can include node resources, which themselves include processor resources. The type of the low level resource would be "machine/node/processor" and the name could be "/MyCluster/node288/proc1". (By convention, resource types have no leading /, but resource names do. The reason for this, if there ever was one, has been forgotten.) The value of these hierarchies is their ability to simplify searches for data. If an experiment has gathered timing data for a given function on all processors in a parallel job, a query can request the results for "/MyCluster" without specifying all the individual processors.

Resources can also have attributes, which describe the resource in some way. For example, a resource representing a processor could have the attribute names clock rate, manufacturer, model. Corresponding attribute values might be 2 GHz, Intel, and Xeon. Unlike resources, attributes are not hierarchical. PerfTrack can search for results based on attributes, so a query could ask for results from all measurements than ran on Intel Xeon processors.

The expected usage of this data model is that a user would determine what resource types to control or gather for an experiment. These are the independent variables. Each combination of specific values for these variables that occurs during the experiment defines a context, and one or more measurements can be made for a given context. For example, a context could describe a machine, a set of compiler flags, a version of an application, a set of inputs to an application, and the name of a function in that application. Measured values (dependent variables) might be the cumulative inclusive and execlusive execution time and the FLOP count. Attributes could be used to annotate certain resource values (but not individual performance results, at present). Essentially, this is a way to add descriptive fields to resources, such as processor nodes, without having to repeat this description whenever the resource appears in a context. For example, "processor type" could be a resource type in its own right, but by making it an attribute of a particular compute node, we avoid having to put "processor type" in each context that uses that compute node.

IPM

The Integrated Performance Monitoring (IPM) representation is XML. So one could infer a schema from it. Below is an example file. And here is a Python script to convert the IPM XML profile to PERI-DB XML.

<?xml version="1.0" encoding="iso-8859-1"?>
<ipm_job_profile>
<task ipm_version="0.918" cookie="1185656221.344615" mpi_rank="0" mpi_size="35" stamp_init="1185656221.344615" stamp_final="1185733072.567627" username="auser" groupname="agroup" flags="688062532" pid="111176" >
<job nhosts="5" ntasks="35" start="1185656221" final="1185733072" cookie="1185656221.344615" code="unknown" >b0201.nersc.gov.219169.0</job>
<host mach_name="bassi" mach_info="00C611EF4C00_AIX" >b1012</host>
<perf wtime="7.68512e+04" utime="6.96756e+04" stime="1.622e+03" mtime="1.34208e+04" gflop="1.42760e+05" gbyte="5.07507e-01" ></perf>
<switch bytes_tx="-1.00000e+00" bytes_rx="-1.00000e+00" >  </switch>
<cmdline realpath="unknown" >vasp</cmdline>
<exec><pre>
unknown
</pre></exec>
<exec_bin><pre>
unknown
</pre></exec_bin>
<env>TERM=xterm</env>
<env>AUTHSTATE=LDAP</env>
<!-- etc. ; the whole environment.. -->
<ru_s_ti>2.9520e-03 8.4945e-02 263172 0 2625 0 410 4 0 0 0 0 0 0 128 0</ru_s_ti>
<ru_s_tf>6.9676e+04 1.6223e+03 532160 128186254 36964922913 0 38544 140 0 0 0 0 0 0 274730117 1152314</ru_s_tf>
<ru_c_ti>0.0000e+00 0.0000e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0</ru_c_ti>
<ru_c_tf>0.0000e+00 0.0000e+00 0 0 0 0 0 0 0 0 0 0 0 0 0 0</ru_c_tf>
<call_mask >
</call_mask >
<regions n="1" >
<region label="ipm_noregion" nexits="1" wtime="7.6850e+04" utime="6.9674e+04" stime="1.6222e+03" mtime="1.3421e+04" >
 <hpm api="PMAPI" ncounter="6" eventset="0" gflop="1.4276e+05" >
<counter name="PM_FPU_1FLOP" > 20347621530341 </counter>
<counter name="PM_FPU_FMA" > 61206189077266 </counter>
<counter name="PM_ST_REF_L1" > 23940740396289 </counter>
<counter name="PM_LD_REF_L1" > 67707927326108 </counter>
<counter name="PM_INST_CMPL" > 207212272540824 </counter>
<counter name="PM_RUN_CYC" > 132063448636730 </counter>
</hpm>
<func name="MPI_Comm_rank" count="7" > 3.5763e-06 </func>
<func name="MPI_Comm_size" count="7" > 2.3842e-06 </func>
<func name="MPI_Bcast" count="581767" > 3.6319e+01 </func>
<func name="MPI_Barrier" count="123" > 3.0227e+00 </func>
<func name="MPI_Allreduce" count="1544761" > 1.0360e+04 </func>
<func name="MPI_Alltoall" count="9206070" > 2.9625e+03 </func>
<func name="MPI_Alltoallv" count="19970" > 5.8784e+01 </func>
</region>
</regions>
<internal rank="0" log_i="1185733073.878701" log_t="1.3111e+00" report_delta="1.1997e+00" fname="/scratch/scratchdirs/consult/log/ipm/sergio.1185656221.344615.0" logrank="-1" ></internal>
</task>