REPORT: Workshop 1
Establishment of Use Cases for Archived Data and Software in HEP
Date: Thursday-Friday, March 21-22, 2013
Location: CERN, Geneva, Switzerland
M.D. Hildreth, E. Long, R. Johnson, DASPOS co-organizers, with K. Bloom, R. Gardner, M. Neubauer, D. Thain, and ... DASPOS attendees
Abstract: The first DASPOS meeting was held joint with the 7th DPHEP Workshop, with planning coordinated between DASPOS and DPHEP management. It was hosted by CERN. Most of the first day of the workshop was devoted to areas of mutual DPHEP/DASPOS interest. This included projects related to analysis preservation, such as presentations on data-analysis-based outreach activities, Rivet/HEPDB, and an analysis preservation effort led by phenomenologists. It also included overviews of current data/analysis preservation efforts from Babar and the Tevatron experiments, and an overview of data-analysis workflows from the four LHC experiments. This report summarizes findings and areas of future work based on the information presented and discussions during the workshop.
1. Introduction
The first DASPOS workshop was held in conjunction with the 7th DPHEP workshop at CERN on March 21-22, 2013. Topics of discussion and the talks themselves were arranged thematically such that those of strong mutual interest to DASPOS and DPHEP were grouped together on the first day of the workshop. Participants included representatives of each of the four LHC experiments, the Tevatron experiments, BaBar, and DESY. In addition, a separate set of discussions on outreach and high-level analysis preservation also included representatives from the theory community and Rivet/HepData.
The workshop agenda, including copies of the talks that were presented, can be found on the public link:http://indico.cern.ch/conferenceDisplay.py?ovw=True&confId=233119
1.1. Workshop Themes
Part of the scope of the DASPOS project is to explore various aspects and levels of data and knowledge preservation in High Energy Physics and beyond. A primary focus of this first workshop was an examination of the data processing and analysis workflows of the active HEP experiments. This was coupled with a review of other “high-level” analysis preservation efforts and projects related to outreach. These other activities were included because they might serve as viable methods for preserving analysis knowledge, functionality, and documentation.
More specifically, the sessions included presentation and/or consideration of the following:
- Outreach efforts, data formats, visualization tools, presented by the four LHC experiments
- What has been the progress of each experiment in developing exercises, what data is used, what technologies are involved?
- Is there any interest or benefit to common tools or formats for analysis exercises and for data visualization, both for histograms and event displays
- Analysis Preservation embedded in common HEP Tools
- Rivet, HepData
- Uses, content, potential expansion to be more inclusive
- Les Houches Recommendations for presentation/dissemination of analysis results for new particle searches
- Efforts at standardization, preparation of analysis database
- RECAST
- Not explicitly presented at the workshop but will be discussed in the context of high-level analysis preservation below
- Data Processing and Analysis Workflows of HEP Experiments
- What are the common elements and differences in the way each experiment
- Sets up processing elements for its data, including conditions
- Handles the data processing itself, including processing steps, workflow creation and storage
- Analyzes the data, including additional processing steps and the use of common formats
- Preserves knowledge about finished analyses
The overview of the workflows and analysis efforts in the experiments was facilitated by the development and distribution of a Data Preservation questionnaire based on the Data Curation Toolkit (ref?). The full questionnaire is included as an appendix to this report.
1.2 Workshop Goals
As stated in the original DASPOS proposal, the first workshop was intended to:
- Establish use cases for data access and re-use, especially for the larger DPHEP data tiers, since this will be a primary driver of the preservation architecture,
- define what data and associated information supports the use cases, and
- identify a preliminary set of metadata that would serve the needs of the HEP community in accessing the various forms of archived data/algorithms.
To this end, a large fraction of the workshop was devoted to detailed presentations from each of the experiments on processing and analysis workflows. Analysis of the inputs will be presented in the later sections of this report.
1.3 Overview of Report
The report follows the general flow of the themes outlined above. While this report does contain some summary of the factual information presented in each of the sessions, its main purpose is to distill the wide variety of information into a form that can be understood by specialists from other fields besides High Energy Physics. To that end, each section gives a short summary of the content presented in the workshop and then elaborates on the overall themes that either arose from the discussion or that have developed from further reflection. Section 3 examines the different processing and analysis workflows presented by the experiments. An overall analysis of knowledge preservation in HEP is presented in Section 4.

