Datasets terminology misleading & contradictory
Please find attached a terminal-window output names crossRef_1.rtf (best seen in rich text format and full screen as the rows are very wide, as there is one ODP dataset per row).
This is the output of an analysis Python program written by the ESA Climate Office which cross-references three things - (i) the GetRecords XML response from CEDA's CCI data node, (ii) an ESGF XML response for CCI data records, and (iii) terminal window output from CCI Toolbox CLI listing of data sources converted to very simple XML by hand. These three data sources are also attached.
In short, this output expresses the big picture of data across ESGF + CCI ODP + CCI Toolbox.
The "t
The following is an issue : there is fundamental terminological contradiction between the CCI Toolbox and CCI ODP. This requires fixing on the CCI Toolbox side.
The CCI Toolbox documentation refers to the (atomic) data coming from CEDA as "data sources" but often by other terms (e.g. "datasets"), whereas CEDA call the rows listed in the attached output as datasets and the associated things in ESGF as "alternate identifiers" (or words to that effect in the GetRecords).
The CCI Toolbox team are to settle on and clarify terminology which matches the CCI ODP (the primary source of data on which the CCI Toolbox depends) and is clearly consistent and mappable to the CCI ODP.
At the moment only about 37 CCI datasets (using the CCI ODP terminology here) are directly accessible by the CCI Toolbox (CLI), therefore it is misleading for the CCI Toolbox team to claim, for example, that The toolbox can access <about 120 or so> data sets from CCI ODP' as these are not actually datasets.
To fix this issue the following must be implemented across all Software and Training Material –
- Clarification of data terminology in the documentation.
- Fixing of use of the term dataset throughout.