Standardization of Data Object Design (DOD) within ARM’s software products

An important capability of the ARM program is the ability to deliver consistent data to the atmospheric science user community.  In an effort to improve that capability, an effort was undertaken to standardize the names and attributes within the ARM data.  This capability will improve the data stream development process and provide better data analysis tools for the user community.  Currently, there are inconsistencies across data streams.  Development of standards for field and global names and attributes will improve the usability of the ARM data.  This document outlines the goal of this task and the guidelines for the construction of the field names, mandatory field and global attributes.

The goals are:

Process for Developing Standards:
A review of the current VAP Data Object Design (DODs) was performed to determine the scope of this task.  An initial document was constructed at PNNL and presented at the Developers meeting at the Brookhaven National Lab.  With the consensus of the developers and managers, a bi-weekly meeting has been held with Robin Perez, Sherman Beus, Brian Ermold, Matt Macduff, Chitra Sivaraman and Krista Gaustad.

Guidelines for construction of field names:

1.  Names will consist of lower-letter, digits and underscores and begins with a letter.  Upper case is not to be used. (site/facility designation is the exception)
2.  The field names will be constructed by joining the names to the qualifiers using underscores.
3.  One has to be reasonable when picking field names.
4.  Field names will be concise.

Hierarchy:
If a conflict arises, then the following hierarchy shall be used.

  1. [super prefix] e.g. qc, aqc, be
  2. [prefix] e.g. interpolated, calibrated, instantaneous
  3. [measurement] e.g. vapor_pressure, pressure, temperature
  4. [subcategory] e.g. head, air, up, short, hemisphere
  5. [medium] e.g. earth, satellite, sea
  6. [height] e.g. 10m,05km
  7. [enumeration] e.g. e, w, n, s, a, b, 1, 2
  8. [quantity] e.g. mean,<field>_std is the preferred format for standard deviation
  9. [source name] e.g. smos, smet

It is important to be reasonable. Field names should be as concise as possible.  For example, "temperature" should be "temp" and name hierarchy should only be used when necessary for field differentiation.
Example of field names using some of the hierarchy:

Dimensions
If a dimension is used, then a field level variable with the name of the dimension should be added with a long name and units attribute.  Examples of dimensions are bin, height, range, depth.
Example: dimensions:
                                    time = UNLIMITED ; // (2878 currently)
                                    range = 1999 ;
      variables:
                        float range(range) ;

range:long_name = "Distance from transceiver to center of corresponding range_bin" ;
range:units = "km" ;

 

Attributes:

In general, the attribute names will be in lower case. The words should be separated by an underscore. A single lengthy comment attribute is preferred instead of multiple comment attributes (reference_line_n) with one line of explanation in each comment.
Mandatory field attributes

Standard field attributes

Other possible attributes (not all inclusive):

                wspd:long_name = "Mean Wind Speed" ;
                wspd:units = "m/s" ;
                wspd:sensor_height = 2m;
Global attributes should be used to specify the heights when all sensors are at the same height.  Example: sensor_height =  “10 m AGL”;

Mandatory global attributes, standard (*) global attributes

sgpmwrlosC1.b1 : 1.17 : 20010209.000000 ;
sgp1twrmrC1.c1: Release_1_4 :  20010209.000000 ;
sgparscl1clothC1.c1 : Release_2_9 : 20010209.000000”;

  "PIR2-DIR:       30167F3",
  "Diffuse PSP:    33271F3",
  "NIP:            31876E6",
  "PSP-DS:         33703F3",
  "SKY-IR:         1845" ;

 

Obsolete global attributes:

Version Discussion:

Historically, the ARM DOD’s have had several version attributes in the DOD.  The versions of the various components of the system and softwares are particularly important when a datastream is reprocessed and differences occur in the output. 

 

Solutions:

Since the user has to usually look further to reproduce an exact version of the data, the DOD will provide only the minimum information to help the user.

At the discussion, these assumptions were made.

 Guidelines to source field:

 

When multiple inputs or algorithms are used to compute data fields, it may be useful to indicate the source of the input or algorithm.  In such a case, a new variable indicating the source of the data could be added.  The source field is optional and can be added at the discretion of the developer or translator or when a user might find it helpful.  “User” in this case could be the end-user of the product or the users as designated by the development team during the evaluation stage.  

 

To standardize the way VAPs write the source field, the following guidelines shall be followed:

1.      The name of the field shall be “source_<field>” where <field> is the variable field name.

Example: 

aod(time, height)

                  source_aod(time, height)

If the source is constant with height, the source field shall be a function of time only.   Note that if the sources do not change overtime, an attribute could be added to the field itself indicating the source instead of creating a separate field.

              

2.      The source_<field> should be accompanied by source_<field>:description with the following text.

Example:

source_be_aod_500:description = "This field contains integer values which should be interpreted as listed. A value of -1 represents no source available." ;

 

3.      Values of the source field shall be of type “integer”.

 

4.      If there are no data samples for a particular time, the source value shall be set to -1 (negative 1).  The value ‘0’ (zero) shall not be used.  If a source preference ranking is appropriate, lower numeric values will indicate higher preference. 

 

5.      The meanings of the possible integer source values shall be indicated in the source field attributes “value_n” with the syntax “datastream_name: field_name”.  The site and facility may be included as part of the datastream_name and is left to the discretion of the developer or translator.

Example:

                  value_1=”mfrsr.c1:aerosol_optical_depth”

                  value_2 =”mfrsr.b1:aerosol_optical_depth”

 

6.      The source field attribute “value_n_description” is optional and can be used to provide more details on how the data was computed.

Example:

source_be_angstrom_exponent:value_5_description = "Fill gaps of 3 days or less via interpolation" ;

 

Guidelines to name quick-look plot filenames:

 

The standard convention for VAP quick-look plot filenames created at the Data Management Facility should be as follows. 

datastream.level.date.time.description.extension

Note: The delimiter should be a ‘.’ (period) except within the description when it should be an (_) underscore. 

Example:

sgp30ebbrE9.b1.20100101.000000.latent_heat_flux.png

Summary:

Standardizing the field names in the ARM program's VAPs and ingests presented in this format could significantly increase current data user satisfaction and acceptance of the ARM data sets within the atmospheric science and educational communities. Implementation efforts will begin when new VAPs and ingests are being developed and getting ready to be released.