Data Documentation TemplateContents1. Summary 4. Data Acquisition and Processing 5. Data Quality, Errors, and Usage Guidance 6. References and Related Publications The documentation (which includes all metadata) that accompany each project data set is as important as the data itself. Complete documentation is necessary to ensure appropriate data use and long term stewardship of the data. The IPY data policy cites the Open Archival Information System (OAIS) Reference Model in defining complete documentation as "all the information necessary for data to be independently understood by users and to ensure proper stewardship of the data." The formally structured metadata required in the IPY metdata profile are minimal. Much more information is necessary to ensure data are "independantly understandable," especially given the broad interdisciplinary use of IPY data. Several metadata standards are much more comprehensive and may be sufficient, if used with sufficient detail. Notable examples include the Federal Geographic Data Committee (FGDC) Content Standard for Digital Geospatial Metadata (CSDGM) (FGDC-STD-001-1998) with Remote Sensing Extensions (RSE) (FGDC-STD-012-2002) and the ISO 19115 and related standards. Given the broad interdisciplinary breadth of IPY, it is not practical to require all projects use the same comprehensive metadata standard. Instead this document outlines a template for the elements that should be included as part of complete data set documentation. The template is built from existing documentation templates used by the National Snow and Ice Data Center and the Earth Observing Laboratory with some consideration of the OAIS Reference Model and recomendations in the Global Change Science Requirements for Long Term Archiving Report. This is a first draft. I request comments and additions. The data set documentation should accompany all data set submissions and contain the information listed in the outline below. While it will not be appropriate for each and every data set to have information in each documentation category, the following outline (and content) should be adhered to as closely as possible to make the documentation consistent across all data sets. It is also recommended that a documentation file submission accompany each preliminary and final data set. Development of the documentation will need input from both investigators and data managers. 1. Summary1.1 CitationWe strongly encourage users to cite the researchers who developed the data set. Different publications will require different styles of citation, but providing and example can ensourage and help users formally cite data. Examples:
1.2 SummaryProvide user with enough information to determine the usefulness of the data set. Should start with a topic sentence, describing what information is in the data set (sea surface temperature, brightness temperature, snow cover, etc). Good to include parameters, location, temporal coverage info in first few sentences so users can get at-a-glance idea of what this is. Should include brief statements of the following important information.
1.3 Usage GuidanceBriefly describe what kind of applications are suitable for the data. Describe the original intent of the data collection and potentially broader applications. 1.4 AcknowledgementsThe purpose of this section is to acknowledge all major participants in the data collection and assembly process, who might not be covered in the citation. It can also be used to credit funding agencies. Example:
2. Detailed Data Description2.1 IntroductionIntroduce the data set and to provide an overview of the contents, background, potential applications, and other general information. This should provide more detailed context than Section 1.2 Summary. 2.2 Parameter or VariableList the scientific variables measured in the data set, along with units of measure for each. Example:
You may even choose to have a single table here that gives parameter names, units, ranges, and sample values, instead of using some of the subheadings below. Parameter Description Definition and units of scientific variables in the data set. Parameter Range The range of data values that exist for the data. Include a list of valid values for codes that indicate missing values, quality flags, errors, etc,. 2.3 Data Coverage, Representation, and ResolutionTemporal Coverage The period of time which the data collection covered, more or less continuously. Be sure to list temporal data gaps if any. Indicate if data are ongoing "to present." A figure showing gaps for various parameters can be useful. Temporal Resolution Describe the optimum and typical intervals between measurements during the periods of data collection. This can be the sampling frequency for an instrument and the intervals between measurement periods. It can also be the length of time it takes to collect an entire sample or scan. Example for a remote sensing data set:
Example for a basic in-situ data set:
Spatial Coverage Provide more information than the four bounding coordinates in the metadata. Describe the coordinates of individual files, granules, or polygons. An image or map showing the coverage may be useful. Where appropriate include official and local place names and geographic context. Spatial Resolution For gridded data describe both the grid size and the actual resolution plus any resampling performed. Describe any interpolations schemes. Example:
For vector data [xxx] In the case of data collected in the field, you can list the sampling interval if you know it. Example:
Projection Describe the projection, ellispoid, Coordinate Reference System used in processing the data. [xxx] Grid Description Describe the method and procedure for gridding and/or binning the data (for gridded data sets). Give the dimensions of the grid and the locations of the corner points. 2.4 FormatDescribe the format, structure, and dimensions of the data in detail. Example:
Sample data or images are useful. Describing gridded binary data The following information is particularly helpful to users of flat-binary data. Try to add as much as possible, if you can confirm it.
2.5 File and Directory StructureExplain how the data are organized. List directories and subdirectories. If data files are provided in zipped files, explain the contents of the zipped file, especially when the zip file contains multiple directories. File Naming Convention Explains the file naming convention in detail. Example.
File Size Specify size of individual files; or provide a range of sizes if you have many files. Including a total size for the entire data set. Example:
Fixity Information Describe any authentication mechanisms and authentication keys used to ensure that the data has not changed in an undocumented manner. Examples of fixty mechanisms include checksums, message digests, and digital signatures. The Message-Digest algorithm 5 (MD5) is a common approach. Sample Data Record or Browse Image Show sample data record. For an ASCII file, explain the columns. You may have sample images for binary data. If your data set contains browse images, show a few here. Create thumbnails that link to larger images. For gridded data, you can show images derived from data, but be sure to explain that these are not representative samples of the actual data. 3. Data Access and Tools3.1 Data AccessDescribe how a user could obtain the data. The specific link or program call should be included in the metadata, but this is a place to provide more details and explain alternatives. It is especially important for non-digital data. 3.2 VolumeSpecify the total volume of the product. The purpose of this field is to help the user decide whether and how they could transfer the entire data and to tell them how much storage space is required to hold the entire data set. 3.3 Software and ToolsDescribe software that is available for working with this data set. Provide references and URLs to sites where they fully described, if possible. Try and include open source software. 4. Data Acquisition and ProcessingDescribe methods of data collection and processing, potentially including instrument descriptions, sampling strategies, laboratory methods, processing steps, calculated variables, theory of measurements, data sources, and/or any other appropriate information. Cite all relevant literature and include them in the "References" section. Not all of the subheadings in this section many be needed. Add, delete, or combine them as makes sense for the data set. Theory of MeasurementsTheoretical basis for the way in which the measurements were made for all data used in creating this data set. Sensor or Instrument DescriptionDescribe the instrument(s) used to collect data. Include links to technical specifications Data Acquisition MethodsDescribe the procedures for acquiring this data in sufficient detail so that someone else with similar equipment could duplicate the measurements. Note that this is the procedure by which the data were acquired (either collected or where the Principal Investigator got it). It is not the procedure by which the data were processed or computed from the originally obtained data. If there is relevant calculation information, it goes into 'Processing Steps' and that section is referenced here. For higher level data products, this section should refer to the group or persons from whom the PI obtained the data. A reference to a lower level document describing the collection/processing of the data the PI acquired to produce the data set described here should be made. If no lower level description exists, describe the method by which the original data was acquired, unless it is a routine product acquired from a commercial or government agency (e.g., a USGS map). Data SourceFor derived or value-added products, cite the original data source(s). Derivation Techniques and AlgorithmsDescribe any special techniques or algorithms used. This section contains detailed descriptions and references on models and derivation techniques. General statements go into 'Theory of Measurements' section. Processing StepsIndicate the sequence of processing steps that the investigator applied to the data. If the data are processed internally to the instrumentation, you do not need to describe that processing in great detail here. This section should concentrate on the processing that is actually done by the investigator. 5. Data Quality, Errors, and Usage Guidance5.1 Assumptions and Data UncertaintyThis relates to the theory of measurements section above, but it is wise to clearly state the common assumptions experts may take for granted, but that may not be apparent to data users from a different discipline. Similarly, it is important to descibe the general uncertainties around the data that may not be readily recognized by non-experts. This could be a description of the error bars around derived measurements, limits in applied algortihms and theory, uncertainties in source data, etc. 5.2 Data Quality Assessment and ValidationDescribe QA and QC processes for the data both during collection and analysis. This could include basic range checks, more comprehensive assessments, or elements of the collection protocol. Describe any data validation studies. 5.3 Error SourcesDescribe specific known errors in the data. How they are indicated and addressed in processing. 5.4 Usage GuidanceProvide appropriate caveats on the use of the data. How should the data NOT be used as well how it should be used. Provide an example of how to actually work with the data. 6. References and Related Publications6.1 Related Data CollectionsList and link to other related data sets. 6.2 Related PublicationsProvide descriptions of and references and links to any relevant technical notes, publications, and agreements related to the data set and any source data. 6.3 References Cited.Provides references and links for any publications referenced in this document. 7. Historical InformationThis is information that should be maintained by the data archive to understand how the data have evolved over time data are well preserved 7.1 ProvenanceThis section is to document any changes to the data over time. Information should include
7.2 Reference InformationDescribe any historical references or nicknames for the data set. Describe how the data set relates to other data set especially other versions of the same data set . Describe the versioning scheme. 8. User FeedbackThis section is to capture a log of user experience with the data. Changes to the data that result from this feedback should be noted her and documented in the Provenance section |
|