**************************************** Process Control Manager (PCM) Interface **************************************** The Process Configuration Manager (PCM) refers to two interfaces through which users describe processes that produce data products (e.g. PCM Processs GUI) and the data products that are produced. The description of the output products are referred to as Data Object Designs, thus the GUI is referred to as the PCM DOD GUI. Because the PCM is intended to expediate the transfer of a scientific algorithm into the ARM production processing system, it is expected the desired output product is well defined. As such th providesand how to subsequently run that process within a compilable project and with the data_consolidator application is described in the following sections. Flow charts diagramming the key development stages are available in `ADI Development Steps`_ section presented at the end. ================ PCM Description ================ The Process Configuration Manager (PCM) is the master interface from which a user can access the Process Definition and other tools used to view, edit, and define an ADI application’s process, input, and output configurations. Currently loaded interfaces are displayed on the right panel, while access to ARM’s datastreams and processes is maintained on the left panel via the **Processes** and **Datastreams** tabs. A user can have multiple tools open and be simultaneously viewing or editing datastreams, processes, and DODs. Each instance of an active tool is maintained in a set of tabs located along the top of the PCM’s right panel as shown in the following figure *The Process Configuration Manager (PCM)*. By default the PCM will enable the Datastream tab of the left panel and display the **Intro** tab on the right panel. Note in the left frame, below the **Filter the List** heading, the **Production** and **Development** tabs are grayed out. This indicates that the list of datstreams displayed is a combined list of both production and development datsatreams. To filter out the development datastreams, select the **Production** button. To filter datastreams by a string, enter the string into the blank cell below the **Production** button. .. figure:: /images/pcm_annotated_UI.png :width: 600 px The Process Configuration Manager (PCM) From the **PCM:Datastreams** panel a user can view existing datastreams, their associated DODs, create new datastreams, and define DODs for a datastream. From the **PCM:Processes** panel a user can view an existing process, or create a new process. =================== Process Definition =================== Defining an ARM process consists of defining its inputs and outputs, and documenting where it will run through the Process Definition Tool component of the ADI PCM. A Process Definition includes definitions of inputs, outputs, and operating parameters relating to the process versus an input, output, or transformation being applied to an input. A summary of the information needed to define an ADI process and information helpful in completing the process definition is presented below. **Process Name** *required* Name of the process. The executable will be the name followed by '\_vap.' ---------------------------------------- Process Locations and Other Options Form ---------------------------------------- .. figure:: /images/pcm_Process_Locations_Options_form.png :width: 600 px :alt: Process Locations and Other Options Form Locations and Other Options Form **ARM Facilities for which this Process Should Run:** *required* Each ARM site/facility pairing for which the process is valid to run. Can only run at facilities that are documented in the DSDB. **Automated Email:** *required* Email address to receive error and warning messages produced by the process. **Processing Interval** *required* The number of seconds of data that each interaction takes across the ADI modules that follow the initialization and prior to the finishing (retrieve, merge, transform, create output datasets, store data). Defaults to a single day (86400 seconds). If set to 0 the size of the chunk of data processed through the retriever, merge, process modules equals the size equal to the begin and end dates, plus any time offsets. ------------------- Inputs/Outputs Form ------------------- .. figure:: /images/pcm_Process_Input_Output_form.png :width: 600 px :alt: Process Inputs/Outputs Form Fig: Input/Outputs Form **Process Type:** *required* If the process will be retrieving data from netCDF data files, select ADI. Basic type is for non-netCDF input files. Load input netCDF files into the DOD as needed as described in **Retreival Defnition:** *required* Input datastream for an ADI type process are specified as part of the retrieval definition process. Select the Edit Retrieval button and complete the Retrieval Definition Form as described in **Process Output Datastream Classes:** *required* Select an existing datastream class and data level from the drop-down list of available values. Enter a new datastream base platform name and datalevel that conforms to ARM naming standards. If a DOD for that datastream is in the DSDB, an expandable reference that is also a link to the DOD interface page is displayed. ----------------------- Defining a New Process ----------------------- To define a new process, perform the following steps. 1. Open the Processing Configuration Manager (PCM) and login from https://engineering.arm.gov/pcm/Main.html. Use your ARM wiki user name and password. .. image:: /images/pcm_Login_to_PCM.png :width: 624 px 2. Select the **Processes** tab in the upper left panel of the PCM as shown: .. image:: /images/pcm_Selecting_new_Process.png :width: 624 px 3. Define a new VAP. - Enter a name for the process. For this tutorial, we will name the process example_vap. - Select the process type as VAP. - Enter the facilities at which the VAP should run by selecting site and facility pairs from the provided drop list. To display the drop list, start typing the name of the facility. You can then select the facility from the list of candidates. For this tutorial, select NSA C1 to start. To delete a selection, click the X beside the item name. If a site is not listed it needs to be loaded into the database. Contact Sherman Beus for further information. - Enter the output datastream name, which for this tutorial is examplevap.c1 (note that site and facility are not included in this definition). As in the previous step, a drop list will display candidate datastream names. - Select the ‘Save’ button to save the entries to the DSDB. In the example shown below the VAP process created is ‘example_vap’, it runs at the sgp C1, sgp B1, and nsa C1 facilities, and produces the output datastream examplevap.c1. If the examplevap.c1 datastream has not been previously defined, saving the process information to the DSDB will result in the addition of the examplevap.c1 to the list of datastreams available from the PCM:Datastreams view. Note the Process Definition form is labeled as an ‘example_vap Process Form’ tab at the top of the right panel next to the ‘Intro’ tab. .. image:: /images/pcm_process_definition_tool_edit_retrieval.png :width: 624 px ----------------------------- Updating an Existing Process ----------------------------- To rename an existing process perform steps one and two from `Defining a New Process`_, edit the name of the process and save the change. It is not possible to duplicate a process. However, the Text Export/Import button displayed in the lower right hand corner of the Retriever Editor form can be used to copy all the Retriever Table database entries into another process. To fully duplicate the process, the attributes associated with the process will need to be reentered into the new processes Locations and Other Options, and Inputs/Outputs forms. ========================================================================= Specifying Variables to Retrieve and Conversions and Transforms to Apply ========================================================================= At the most basic level, defining the inputs to a VAP consists of documenting the name of a variable and the datastream from which it should be retrieved. Historically, a significant effort was expended performing pre-analysis data consolidation and transformations to prepare input data for scientific analysis. To minimize, if not eliminate, the need for VAP developers to perform such tasks, ADI allows a user to - Define preferred and alternative input datastream sources - Assign a generic name to retrieved variables that will be referred in the automated source code - Use a simple check box to retrieve companion QC variables - Apply unit and data type changes to the data as part of the retrieval process. Additional control is also provided to define input data source preference by site/facility pairing or time range dependencies. The inputs of all VAPs must be specified using a retriever process. As such, by default the ‘This process uses a retrieval for its input configuration’ box will be checked and an ‘Edit Retrieval’ button should be evident in the lower left side of the right panel. Selecting this button will bring up the Retrieval Definition form shown in the Retrieval Definition Form as sown in the following example screen capture. Note that the Retrieval Definition form has replaced the Process Definition form but it is still organized under the ‘example_vap VAP Process’ tab. To return to the main Process Definition form select the ‘Done’ button at the bottom left of the right panel. .. note:: Retriever data will not be stored in the DSD until the ‘Save’ button in the Process Definition form has been selected. .. image:: /images/pcm_retrieval_definition_form.png :width: 624 px ------------------------------------ Retrieval Definition Table Overview ------------------------------------ The Retrieval Definition form allows the user to not only specify the variable and data source from which to retrieve a variable, but also to perform some basic transformations of units and data type. These options are checkboxes in the bar above the table. Selecting an item adds the data entry column to the table. In addition to the transformations, the bar also allows the user to retrieve data for a particular variable for some extra time before and/or after the process period specified in the command line, and to automatically retrieve the companion QC variable. A description of each of the columns in the Retrieval Definition table is given below. +------------------------------------------------------------------------+ | ADI Retrieval Definition Form Parameters | +======================+==============+==================================+ | **Process Element** | **Required** | **Description and Comments** | +----------------------+--------------+----------------------------------+ | Source(s) | Yes | Datastream Source(s) is the | | | | datastream(s) from which the | | | | value(s) for the variable should | | | | be retrieved. | | | | | | | | Populated via the Data Sources | | | | Definition form. A single value | | | | can be retrieved from a | | | | prioritized list of preferred | | | | and alternate datastreams. (in | | | | Figure 3.2 the first_cbh variable| | | | is retrieved from either the | | | | vceil25k.a1 or vceil25k.b1 based | | | | on the indicated conditions and | | | | correlated to a user) defined | | | | variable ‘cloud_base_height’). | +----------------------+--------------+----------------------------------+ | Variable Name | Yes | The Variable Name consists of the| | | | user defined name of the variable| | | | to be retrieved, and an | | | | indication of whether finding the| | | | variable in one of the specified | | | | input data sources is a | | | | requirement that must be met for | | | | the VAP to successfully run. | | | | | | | | Variable names in the ‘Variable | | | | Name’ column must be unique. | | | | If the ‘Required’ check box is | | | | marked, the VAP process will fail| | | | to run for a given observation | | | | (i.e., input data file) unless | | | | the variable specified is | | | | successfully retrieved. | | | | | | | | If the ‘Required’ check box is | | | | marked, an asterisk will follow | | | | the Variable Name. This will be | | | | the name to which the retrieved | | | | data will be referred to in the | | | | DSDB and in auto-generated code. | | | | | | | | It is not necessary for this name| | | | to match the name in the | | | | datastream(s) from which the | | | | variable is retrieved. | | | | | | | | Coordinate dimension variables | | | | (i.e., time, height, range, etc.)| | | | should not be included in the | | | | Retrieval Definition table, as | | | | all coordinate dimensions of | | | | retrieved variables are | | | | automatically retrieved. | | | | This automatic retrieval is only | | | | successful when the dimension | | | | name and variable name in the | | | | input datastream file are | | | | identical. | +----------------------+--------------+----------------------------------+ | Coord System | No | Coord System is the name assigned| | | | by developer to the coordinate | | | | system for a given variable. | | | | | | | | The parameters associated with a | | | | coordinate system are assigned | | | | via the Coordiante System | | | | Definition Form. | | | | | | | | A transformation method must be | | | | defined for each dimension of a | | | | variable’s coordinate system. | | | | | | | | ADI supports two methods of | | | | assigning a coordinate system to | | | | a given dimension; (1) to assign | | | | a uniform system (i.e. a | | | | coordinate system characterized | | | | by a constant interval between | | | | all samples of the dimension) | | | | (2) a mapping (a coordinate | | | | system not explicitly defined, | | | | but indicated by selecting a | | | | coordinate variable from another | | | | datastream to which a retrieved | | | | variable’s dimension will be | | | | transformed). These are more | | | | fully discussed in `Coordinate | | | | System Definition Form | | | | Overview`_ | | | | | | | | It is recommended that all | | | | retrieved variables passed | | | | through to an output datastream, | | | | even when the input and output | | | | coordinate systems are identical,| | | | have an explicitly name and are | | | | defined using a mapping or static| | | | values. For cases where the | | | | output coordinate system is the | | | | same as that of the input | | | | datastream, it should be defined | | | | as a mapping onto itself. This | | | | will fill gaps in data to create | | | | a more complete file. | +----------------------+--------------+----------------------------------+ | Outputs | Yes | The name of the output | | | | datastream(s) and level(s) that a| | | | retrieved variable will be | | | | propagated to as part of the data| | | | consolidation process, and the | | | | name of the variable as it will | | | | be found in the output | | | | datastream(s). | | | | | | | | Populated via the Output Field | | | | Mapping Form. | | | | | | | | The output datastream(s) are | | | | prepolated with all possible | | | | output datastreams documented in | | | | the Inputs / Outputs section of | | | | the Process Definition Form. | | | | | | | | For a retrieved variable to exist| | | | in a output datastream, the name | | | | must be entered into the empty | | | | cell adjacent to the datastream | | | | name and level in the toutput | | | | Field Mapping Form. | +----------------------+--------------+----------------------------------+ | Units | No | Specifies the units into which | | | | the retrieved data will be | | | | converted. Units are converted | | | | using Unidata’s UDunits library | | | | | | | | DEFAULT value results in units | | | | staying the same as found in the | | | | input file from which the | | | | variable is retrieved.. | | | | | | | | Units are entered free form. | | | | Please reference Unidata’s web | | | | page for further information: | | | | http://www.unidata.ucar.edu/ | | | | software/udunits/udunits-2/ | | | | udunits2.html. | +----------------------+--------------+----------------------------------+ | Data Type | No | A drop list of possible data | | | | types into which the retrieved | | | | data can be converted. | | | | | | | | If a value is provided the data | | | | type will default to type float. | | | | | | | | If the data type remains as a | | | | default value through the | | | | population of the Data Sources | | | | Definition form, and a field is | | | | selected from the drop list of | | | | available values, the data type | | | | will be updated to the type of | | | | the selected field as found in | | | | the specified datastream. | | | | | | | | If the default value is | | | | overridden in the | | | | Retrieval Definition table, | | | | the data type will not be updated| | | | as a result of field selections | | | | in the Data Sources Definition | | | | form. | +----------------------+--------------+----------------------------------+ | QC | Yes | Indicates whether the companion | | | | QC variable will be retrieved in | | | | addition to the variable noted in| | | | the Variable Name and whether if | | | | successfully finding the | | | | companion QC variable is a | | | | requirement for the VAP to run. | | | | | | | | It is assumed that the companion | | | | QC variable will be equal to the | | | | name of the variable in the input| | | | datastream file preceded by | | | | 'qc\_.' | | | | | | | | If the ‘Required’ check | | | | box is marked, the VAP process | | | | will fail to run for a given | | | | observation (i.e., input data | | | | file) unless both the variable | | | | and its QC is successfully | | | | retrieved. | +----------------------+--------------+----------------------------------+ | Offsets(Seconds) | No | If both the input and output bins| | | | do not both line up with the | | | | processing interval boundaries, | | | | to be absolutely sure you get all| | | | the input data you need outside | | | | the edge of a processing interval| | | | you will need to define offsets | | | | to [size of input bin] + | | | | [site of transformed bin]. | | | | This will retreive enough data | | | | including the worst case of dia- | | | | metrically opposed alignments | | | | (alignment of 0.0 in one, and | | | | 1.0 in the other). | | | | | | | | Allows a user to retrieve | | | | additional data for each | | | | processing interval either before| | | | the interval or after. This | | | | includes before the begin date, | | | | or after the end date of the | | | | begin and end date entered at the| | | | command line at run time. | | | | | | | | The begin_date and end_date | | | | values are for the "current | | | | processing interval" and are not | | | | adjusted by the offsets. All | | | | records with times before | | | | begin_date or after end_date are | | | | records within the specified | | | | offsets (for normal daily | | | | processing these would be from | | | | the previous day or the next day)| | | | day). Note that begin_date and | | | | end_date are | | | | input parmeters to all user | | | | hooks. | | | | | | | | If an offset defined at the start| | | | of 60 secs, and an offset at the| | | | end of 60sec for sample interval | | | | of 60sec, then the samples will | | | | go from 0 to 1441. But the | | | | output files created will still | | | | be 1440 in size and consist of | | | | the samples 1 to 1440. | | | | | | | | Typically used to provide a | | | | buffer of data to a type of | | | | analysis that needs to see a | | | | larger window of data than the | | | | processing interval of the ADI | | | | process. | | | | | | | | Despite the processing being over| | | | the entire period, the output | | | | file will only be over the | | | | processing interval. | | | | | +----------------------+--------------+----------------------------------+ .. image:: /images/pcm_retrieving_single_value.png :width: 624 px -------------------------------------- Data Sources Definition Form Overview -------------------------------------- The Data Sources Definition form allows a user to define the source(s) of the data to retrieve and assign to the user defined variable. It allows for lists of preferred and alternate data sources, multiple possible variable names, and location and time dependencies. A description of each of the columns in the Data Sources Definition form is given below. +------------------------------------------------------------------------+ | ADI Data Sources Definition Form Parameters | +======================+==============+==================================+ | **Process Element** | **Required** | **Description and Comments** | +----------------------+--------------+----------------------------------+ | Priority | No | Integer representation of | | | | priority when alternative | | | | Datastream Sources are | | | | specified. | | | | | | | | When priority is not | | | | populated, the first row is | | | | the highest priority and the | | | | last is the lowest. | | | | | | | | Dragging and dropping the | | | | rows into the desired order | | | | is another way to adjust | | | | priority. | +----------------------+--------------+----------------------------------+ | Datastream Class | Yes | Datastream from which the | | | | variable with the name noted | | | | in the ‘Field(s)’ column will | | | | be retrieved. | | | | | | | | Must be populated first | | | | before any of the other | | | | elements in the Data Sources | | | | Definition form can | | | | be populated. | +----------------------+--------------+----------------------------------+ | Field(s) | Yes | Name of the variable to retrieve | | | | as found in the datastream | | | | defined as the Datastream Class. | | | | | | | | Initially populated with a | | | | default value equal to the user | | | | defined Variable Name from the | | | | Retrieval Definition form. This | | | | default value is noted by | | | | brackets. | | | | | | | | Value defaults to the user | | | | defined | | | | Variable Name in the Retrieval | | | | Form. | | | | | | | | If the datastream is loaded into | | | | the history database, clicking | | | | on the Fields cell will bring up | | | | a drop list populated with all | | | | possible variable names. If | | | | not, the user should enter the | | | | desired variable name followed | | | | by a . | | | | | | | | If more than one variable is | | | | entered into the Fields column, | | | | the retriever searches the input | | | | datastream file for each of the | | | | variables in the order listed, | | | | until one is found. | | | | | | | | The variable names shown in the | | | | drop list reflect all the | | | | variables that have existed for | | | | that datastream over all time, | | | | not just the variables in the | | | | datastream’s latest DOD. | +----------------------+--------------+----------------------------------+ | Location | No | The variable names shown in the | | | | drop list reflect all the | | | | variables that have existed for | | | | that datastream over all time, | | | | not just the variables in the | | | | datastream’s latest DOD. | +----------------------+--------------+----------------------------------+ | Location Dependency | No | Used when the datastream from | | | | which to retrieve data is a | | | | function of the site/facility at | | | | which the VAP is being run. | +----------------------+--------------+----------------------------------+ | Time Dependency | No | Used when the datastream from | | | | which to retrieve data is a | | | | function of what period the VAP | | | | process is running. | | | | | | | | If a begin or end time | | | | dependency is not selected, the | | | | time dependency defaults to the | | | | beginning of the datastream or | | | | end of the datastream | | | | respectively. | +----------------------+--------------+----------------------------------+ An example of both a location and time dependency is illustrated in the preceding figure. In this example, when the VAP is run for sgpB4, the user defined variable ‘cloud_base_height’ will be correlated to the first_cbh variable in the vceil25k.a1 datastream. If it is not running at sgpB4 and the date being processed falls before April 1, 2001, the user defined variable ‘cloud_base_height’ will be correlated to the variable ‘first_cbh’ in the vceil25k.a1 datastream. For process times April 1, 2001 or greater, and when processing at sites other than sgpB4, the user defined variable ‘cloud_base_height’ will be correlated to the first_cbh variable in the vceil25k.b1 datastream. ----------------------------------- Output Field Mapping Form Overview ----------------------------------- This form is accessed by double clicking a cell in the Retriever Editor, Output(s) column. It consists of row for each of the possible output datastreams with a drop box containing all the variables in that output DOD. To associate a variable in the Retriever Editor to a specific output variable simply select the desired variable from the drop box next to the datastream. This will result in the values associated with the retrieved variable being mapped to selected variable in the output datastream. .. image:: /images/pcm_output_field_mapping.png :width: 624 px ------------------------------------------- Coordinate System Definition Form Overview ------------------------------------------- In most cases, a new coordinate system can be fully defined via the coordinate system definition form. To transform a variable to a new coordinate system means to define new values for one or more of the variables dimensions, and update the variable’s values to reflect the new ‘grid’. The coordinate system of the retrieved variable will be referred to as the ‘source’; the coordinate system of the new grid will be referred to as the ‘target’. The form supports 3 transformation types (1) averaging, (2) interpolation, and (3) nearest subsample. The parameters that can be specified via the form are documented in the following table for each transformation type. +-------------------------------------------------------------------------+ | General Coordinate System Definition Form Parameters | +======================+===============+==================================+ | **Process Element** | **Required** | **Description and Comments** | +----------------------+---------------+----------------------------------+ | Variable(x,y, ...n) | Yes | Name of each dimension for the | | where x,y, ...n | | retrieved variable. | | represent the | | | | dimensions that make | | The order of the dimensions must | | up target coordinate | | match the order of the dimensions| | system. | | of the retrieved variable. | | | | | | | | If the name of dimension is to be| | | | changed, the new name should be | | | | entered. | +----------------------+---------------+----------------------------------+ | Coordinate system | Yes | The name of the coordinate system| | name | | as stored in the CDSTrans | | | | structure and named in | | | | ADI templater generated header | | | | files. | | | | | | | | If no transformation is performed| | | | on a retrieved variable’s | | | | dimensions, then a CDSin | | | | structure is used to store the | | | | information and a coordinate | | | | system name is not needed. | +----------------------+---------------+----------------------------------+ | Units | No | If set, the dimension will be | | | | converted to the indicated units | | | | prior to the transformation. | +----------------------+---------------+----------------------------------+ | Data type | No | If set, the data type will be | | | | converted to the type indicated | | | | prior to the transformation. | +----------------------+---------------+----------------------------------+ | Use mapping | No | Control button. | | | | | | | | If selected, it updates the form | | | | to display a table from which the| | | | user can select the datastream’s | | | | grid, onto which the indicated | | | | dimension will be mapped. | | | | | | | | If not selected, drop boxes and | | | | cells necessary to define a | | | | uniform grid are displayed. | +----------------------+---------------+----------------------------------+ text +-------------------------------------------------------------------------+ | Uniform Grid Coordinate System Definition Form Param | +======================+===============+==================================+ | **Process Element** | **Required** | **Description and Comments** | +----------------------+---------------+----------------------------------+ | Transform type | No | Allows user to select the type | | | | of transform applied, such as | | | | average, interpolation, subsample| | | | etc. | | | | | | | | By default if output bins are | | | | larger then input bins then the | | | | data is averaged, if output bins | | | | are smaller then data is | | | | interpolated, if bin size is the | | | | same no transformation is | | | | applied. | +----------------------+---------------+----------------------------------+ | Bin alignment | No | Tells you where in the bin the | | | | coordinate variable for the | | | | dimension is located in the | | | | context of ‘beginning, middle, | | | | and end’ values. | | | | | | | | Default value is middle. | +----------------------+---------------+----------------------------------+ | Interval | Yes * | Specifies the difference between | | | | two values of the given | | | | coordinate variable to generate | | | | a uniform grid. | +----------------------+---------------+----------------------------------+ | Start | Yes * | The value of the coordinate | | | | dimension for the first element | | | | in the output grid. | +----------------------+---------------+----------------------------------+ | End | Yes * | The value of the coordinate | | | | dimension for the last element | | | | in the output grid. | +----------------------+---------------+----------------------------------+ | Length | Yes * | The number of bins, or distinct | | | | values for the coordinate | | | | dimension. | | | | | | | | For the dimension time this | | | | equals the number of samples | | | | in the file. | +----------------------+---------------+----------------------------------+ | Transform type | No | Allows user to select the type | | | | of transform applied (average, | | | | interpolation, subsample, etc.) | | | | | | | | By default if output bins are | | | | larger then input bins then the | | | | data is averaged, if output | | | | bins are smaller then data is | | | | interpolated, if bin size is | | | | the same no transformation is | | | | applied. | +----------------------+---------------+----------------------------------+ For the interval, start, end, and length parameters, the user sets three of the four and the last is calculated and automatically set. +-------------------------------------------------------------------------+ | Mapped Grid Coordinate System Definition Form Parameters | +======================+===============+==================================+ | **Process Element** | **Required** | **Description and Comments** | +----------------------+---------------+----------------------------------+ | Datastream group | Yes | The datastream to map to is | | | | determined by the user entering | | | | the name of the datastream group | | | | for which the target datastream | | | | is the highest priority | | | | datastream. | +----------------------+---------------+----------------------------------+ In addition to the parameters provided in the form, additional parameters can be defined in a configuration file to further refine the transformation. Each of the transformation types, and the flat file that can be used to define them are discussed in detail in `Transforming or Regridding Retrieved Variables onto a New Coordinate System`_. The entries on the Coordinate System Definition form support the two most common types of transformations, averaging and interpolation. Through this form, the target grid can be defined in one of two ways: 1. By specifying a constant interval between values, a start value, an end value, and the total number of samples. 2. By selecting an existing grid on which to map a variable. The former is referred to as a uniform transformation, the latter, a mapped transformation. Unless the transform type is explicitly defined in the transform configuration file, the libraries determine whether an averaging or interpolation transformation is needed. If the target grid bins are larger than the source grid bins, the data will be averaged to match the new grid. If the target grid bin size is smaller, then interpolation will be applied. If either or both grids are irregular, then ADI will attempt to guess which default transformation should be used based on the average interval over the whole span of the grid. .. image:: /images/pcm_uniform_transform_view_coordinate_system.png :width: 624 px The coordinate system in the figure above is an example of a uniform transformation of the time dimension. It has been assigned the name "thirty_second", transforms the dimension time onto a uniform grid that starts at 0 seconds, grows in increments of 30 to 86370, with a total 2880 values. .. image:: /images/pcm_datastream_mapping_transform_coordinate.png :width: 624 px ------------------------------------------ Populating the Retrieval Definition Table ------------------------------------------ Retrieval Definition table variables and data sources can be populated by either: - Specifying the variable names and datastream sources by typing in the fields in the Retrieval Definition Table. - Accessing the DOD of an existing datastream using the Datastreams tab on the left panel of the Process Configuration Manager and dragging and dropping variables to retrieve into the table. Manual entry is more efficient where there is more than one datastream from which to retrieve the variable. When a variable’s source is a single datastream (no alterate sources if that datastream is unavailable), it is more efficient to access the DOD of the input datastream and drag and drop variables onto the Retrieval Definition table. 1. To enter the Retrieval Definition form. a. From the Process Definition Tool (Figure 2.4) select the ‘This process uses a retrieval for its input configuration’ button. b. Select the ‘Edit Retrieval’ button. 2. Populate the Retrieval Definition Table. .. note:: do not retrieve netCDF standard (lat, lon, alt) or coordinate dimension variables (time, height, range) for a retrieved variable as these will be automatically retrieved. Note: Do not retrieve netCDF standard (lat, lon, alt) or coordinate dimension variables (time, height, range) for a retrieved variable as these will be automatically retrieved. Manual Entry of Input Data Variables and Sources ------------------------------------------------- 1. Select the green plus symbol located to the left of the table form. 2. Select the ‘custom_field_1’ variable and enter the name of the variable to retrieve. 3. Indicate whether the variable must be found for the VAP to run via the ‘Required’ check box. 4. Select ‘Source(s)’ [NONE] in the Datastream column 5. Select the pencil icon to bring up the Data Sources Definition form. 6. Select the Datastream column corresponding to the Field with the value of the variable you just defined to bring up a drop list of possible datastreams. If the data source is a single datastream with no alternative sources: - Select the datastream from which the variable should be retrieved then proceed to step ‘g’. If the data source is a single datastream with alternative sources based on datastream availability: - Select the most preferred datastream from which the variable should be retrieved then proceed to step ii. If the data source is a single datastream with alternative sources based on location or time dependencies: A. Select the datastream from which the variable should be retrieved and define the most restrictive dependencies. B. Create a new row in the Data Sources Definition table by either selecting the green plus symbol to the left of the table, or by duplicating an existing row in the table by selecting the paper symbol on the left. C. Update the Datastream Class column of the new row to reflect the next most preferred (or next most restrictive) data source. D. Repeat the addition of new rows until options are exhausted. E. Review data source priority and populate the ‘Priority’ column as necessary. 7. If the name of the variable found in the Datastream Class datastream does not match the default value, update the entry in the ‘Field’ and select the desired variable name. 8. If a second value is to be retrieved and correlated to the user defined Variable Name in the Retrieval Definition form (meaning the user defined variable in the Retrieval Definition form will be an array of more than one value) specify the data sources and associated values of the additional values as follows: A. Select the ‘Show Advanced Controls’ button in the upper left corner of the ‘Data Sources Definition’ window. This will bring up additional icons along the top left of the Query window to add, close, delete, and adjust the order of additional queries. B. Define a new query to retrieve the additional data value by selecting either the green plus or the paper sheet icon to add a new or duplicate query. C. For the new query define the data source(s) and update the field name. 9. Close the Data Sources Definition window by selecting the ‘x’ in the upper right corner of the window. Return to the Retrieval Definition form and specify additional variables to retrieve by adding new rows to the Retrieval Definition table. Dragging and Dropping Input Data Variables and Sources from Existing DODs -------------------------------------------------------------------------- You can populate the Retrieval Definition Table by Dragging and Dropping from Input Datastream DODs. For example, from the datastream and the variables you can drag and drop the desired variable into the Retriever Definition table. The Source(s), Variable Name, and QC retrieval status will be populated. Update these as required. 1. Select the Datastream tab and locate the datastream from which to retrieve the variable. 2. Select the triangle next to the highest DOD version of the desired input datastream (for example the DOD) to list the dimensions, variables, and global attributes associated with the DOD. 3. Select the triangle next to the Variables to expand the variables. 4. Select the variable to retrieve it with a single click and drag the variable from the left frame and drop it into the Retrieval Definition frame on the right (Figure 3.3). 5. Update the variable’s Source(s), Variable Name, Units, Data Type, QC, and Offset values as necessary. .. image:: /images/pcm_drag_drop_variable.png :width: 624 px For the example VAP we will build in this tutorial, we will retrieve first_cbh, qc_first_cbh, and backscatter variables from the vceil25k.b1 datastream. The first_cbh will be saved into a user defined variable name of ‘cloud_base_height’ and written to the output netCDF file with that name. If that datastream is unavailable the variables will be retrieved from the vceil25k.a1. The units of the first_cbh will be converted to centimeters and the successful retrieval of the QC variable will be required for the example_vap process to run. The Retrieval Definition table for example_vap, with the Data Sources Definition form open for the first_cbh variable, is shown in the following figure. .. image:: /images/pcm_data_source_def_form_VAP.png :width: 624 px The duplicate entry icon (paper sheets icon) in the Retrieval Definition form was used to add the backscatter variable since it is retrieved using the same Data Sources Definition query (i.e., sets of possible input datastreams). The duplicate entry was updated as appropriate for the backscatter variable (update the Variable Name, Field(s), and Units). The coordinate dimensions of the retrieved variables (time and range) and lat, lon, and alt are not included in the Retrieval Definition table as they will be automatically retrieved. Note that the Location Dependency and Time Dependency check boxes in the Data Sources Definition have been deselected as they are not applicable to this example. ---------------------------------------------------------------------------- Transforming or Regridding Retrieved Variables onto a New Coordinate System ---------------------------------------------------------------------------- This section will have documentation on details of transformation. ---------------------------------------- Saving the Retrieval Definition to DSDB ---------------------------------------- The input data retrieval specifications are saved to the DSDB from which it is accessed by the ADI templater application, create_adi_project, to create the project source code files used at run time by the VAP. Note in Figure 3.5 that the user defined variable names that are retrieved from the input datastreams are summarized above the ‘Edit Retrieval’ button. To save retrieval data to the DSDB, perform the following steps. 1. Select the ‘Done’ button in the lower left corner of ‘Retrieval Definition’ form to return to the ‘Process Definition’ form. 2. Select the ‘Save’ button in lower left corner of ‘Process Definition’ form. .. image:: /images/pcm_completed_proc_def_form_VAP.png :width: 624 px ===================================== Running Processes Defined in the PCM ===================================== Processes defined in the PCM can be run either using the data_consolidator application, or by developing an application specific to a particular process. If your intent is to simply consolidate data from existing ARM netCDF datastreams with the application of unit and data type conversions and coordinate dimension transformations, the Data Consolidator application can run the process immediately after the required information has been documented in the PCM, with no need for the user to write or compile any code. If the data product you want to create requires modifying the input data in a manner not supported by the PCM, or includes variables that cannot be derived from the input via the PCM, a software project specific to that process is needed. create_adi_project is a source code generation tool that uses the PCM database entries to create a C, IDL, or Python software project for processes defined in the PCM. The scripts for the project are created from a script generator, can compile and run with no additional code producing netCDF files with all variables that can be derived from the database entries made via the PCM. The source code produced has hooks into which the user can insert their own code, thus, jump starting the development of their ARM Value Added Products (VAPs). ----------------------------- Define Environment Variables ----------------------------- ADI shared libraries use environment variables to determine the location of data, configuration files, and binaries. Users developing their own application or running the data_consolidator appliction will need to set both the data related environment variables described below and those process related ones that follow. Any required subdirectories that ADI expects at the environment variable location are listed. - DATA_HOME Base directory for data. On the ADI development system it is suggested that this directory be defined as /data/home/ /data. Subdirectories: conf, datastream, logs, quicklook: example: /data/home/gaustad/data - DATASTREAM_DATA Location for ARM netCDF data. This should be defined as $DATA_HOME/datastream Subdirectories: // where example with subdirectories: /data/home/gaustad/data/datastream/sgp/sgpsirsC1.b1 - LOGS_DATA: Location of logs generated during run. This should be defined as $DATA_HOME/logs. Subdirectories: Frequently organized in subdirectories. - COLLECTION_DATA: Location of input data if it is not in $DATASTREAM_DATA - DATASTREAM_DATA_IN Same as DATASTREAM_DATA but only used to find the input datastream directories. - DATASTREAM_DATA_OUT Same as DATASTREAM_DATA but only used to find the output datastream directories. Additional environment variables whose default values should not be changed, but may be useful for developers writing additional source code to perform analysis are documented below. Two locations are provided for storing configuration files used by the process. Where a configuration file should be stored is a function of whether the configuration file is routinely updated, or whether it is mostly unchanged (i.e., may be updated a handful of times a year). - VAP_HOME/vap/conf Location of configuration files that do not change over time, or change at most once a year. As such these files can and should be maintained in the VAP's GitLab repository. - CONF_DATA Location for configrations files that change more than once a year. Within this directory files can be organized by site in $CONF_DATA// or by vap in $CONF_DATA// Methods of updating VAP configuration files in CONF_DATA -------------------------------------------------------------- Because the files in $CONF_DATA are not released, an alternative method of installing them on the production processing system is needed. There are two possible methods of updating files in CONF_DATA (1) create a stand alone task in ServiceNow to have the system administrators copy them into the desirect location (2) Use doorstep to install the configuration files. Details for both methods are described below. - Updating files by requesting they be copied to production. This method is recommended when the file will be updated infrequently (a few times a year) or that only need to be transferred to production once when the VAP (or a new site for the VAP) is setup because subsequent updates will be done automatically by the VAP process. Request to transer files to production should be made via a ServiceNow. Preferably in an ENG or EWO associated with the VAP, or if those are not available in a stand alone incident. Describe where the files should be installed in $DATA_CONF and the location that the files to transfer to production can be found and assign to ADC system administrators. - Updating files using doorstep: !!This method can currenlty ONLY be used to install files to $CONF_DATA//!!. As such it only supports installation of conf files that require a seperate file for each site and facility. To use this method (a) Notify the individual who will be providing the new or updated files to deliver them via ftp.arm.gov as 'anonymous' using their email as password. They should place the files in the directory corresponding to the site and facility to which the conf files apply. (i.e. /pub/sites//_conffiles) (b) Submit a task in ServiceNow to have the doorstep.conf file updated. Preferably the task should be a child of an ENG or EWO associated with the VAP, or if those are not available in a stand alone incident. Assign this task to Brian Ermold. Note the process name, sites and facilities that will have files, and who should receive notification that files have been updated. ------------------------ Data Consolidation Tool ------------------------ The ‘data_consolidator’ is an application that performs the transformations and mappings from retrieved variables to output variables for any process defined in the PCM. As such it allows users to consolidate data from diverse datastreams without the need to create or compile any source code. It takes as input the name of the retriever process whose retrievals, transformations, and input to output mappings are to be applied and the typical ARM process arguments. The data_consolidator command line arguments include the typical arguments for any ADI process with the addition of "-n " to specify the process. The frequently used arguments include: :: -n -s -f -b -e -a (possible values are "dsdb_ref" and "devws") -D (level 2 will dump retrieved, transformed, and output structs) -P (to log provenance) -R ( reprocessing flag to allow the overwrite of previously created netCDF files). Additional arguments include: :: --log-dir (path to the log file directory) --log-file (name of the log file) --log-id (replaces the timestamp in log file name with the specified id) --max-runtime (sets the max runtime for the process, 0 disables max runtime check) --files (for ingests only, specifies comma delimited list of files to process) --asynchronous (disables the process lock file, disables check for chronological data processing, disables overlap checks with previously processed data, forces a new file to be created for every output dataset). --dynamic-dods (creates a dod on the fly when the process does not have one assigned to it in the PCM). This requires the following - A datastream name be entered into the PCM Proceses Inputs and Outputs form that does not have a DOD associated with it. The output file will use this datastream name. - The PCM Retriever Editor have entries in the Outputs column that map the retrieved variable to the output datastream. The names provided in the mapping will be the names used in the output file produced. With respect to (-e), end date, please note that the process will run for the date specified as the begin date up to the end date (i.e., NOT through the end date). If the debug level is set to two (-D 2), the data_consolidator app will dump the contents of the retrieval, transform, and output structures to a subdirectory ‘debug_dumps’. The dump files created and the structure they contain are listed below. :: .YYYYMMDD.HHMMSS.post_retrieval.debug .YYYYMMDD.HHMMSS.pre_transform.debug .YYYYMMDD.HHMMSS.post_transform.debug ..YYYYMMDD.HHMMSS.process_data.debug ------------------- create_adi_project ------------------- After the VAP process has been fully defined in the PCM and saved to the DSDB, the create_adi_project application can be run to create a C, IDL, or Python project comprised of a main module, hooks for the ADI Data Processing Modules '_, supporting files documenting retrieved, transformed, and output variables, and Makefiles needed to compile the VAP binary. There are three templates that create a full project, two supporting VAP development and an ingest template. The VAP templates include a ‘transform’ template that creates a project that will include a call to the ADI transformation module, and a ‘retriever’ template that does not include the call. create_adi_project Command Line Arguments ------------------------------------------ The required input parameters for the create_adi_project include the specification of the process for which templates are being produced, the template type, and the directory into which the templates will be created. Optional input parameters are provided to document the source code with the developers contact information, to produce a dump of the DSDB elements associated with the process into a json data file, and to run from such a json dump rather than accessing process information from the DSDB. A complete summary of the create_adi_project command line options is shown in the following table along with an example. +-------------------------------------------------------------------------+ | create_adi_project Usage | +==================+===============+========+=============================+ | **Input** | **Argument** | **Req**| **Argument Description** | | **Arguments** | **Value** | | | +----+-------------+---------------+--------+-----------------------------+ | -h | --help | | N/A | | +----+-------------+---------------+--------+-----------------------------+ | -p | --process | | Yes | Name process defined in PCM | +----+-------------+---------------+--------+-----------------------------+ | -t | --template |