Unified tagging of DataFrames and DataSets
I will write more about how, eventually.
Currently we have:
-
DataSet: abstract base class/interface of a rectangular homogeneous n-dimensional data container, i.e. it hasdataExtend() → NDSize(n-dim rectangular), anddataType() → DataType(homogeneous). -
DataArray: is aDataSetand is the frontend class for actual backing store forDataSetlike data. DataArrays have dimension descriptors and a unit for the data itself. -
DataView: is aDataSetand represents a view of a subset of data in aDataArray, i.e. it is a hyperrectangle of sizecount (NDSize)starting atoffset (NDSize).
Additionally, we have the new DataFrame, a rectangular data container consisting of n columns (name, unit, DataType) by m rows.
Tagging currently is done by having the tag with (multiple) position+extents and pointers to (reference) DataArrays which must match in dimensionality the position and extends.
To allow unified tagging, i.e. DataArray and DataFrame, the references must be changed to either:
- a common base object, that
DataArrayandDataFramederive from - a (new) intermediate object that would in turn then point to a
DataArrayorDataFrame, maybe with additional specifications of how position & extends is applied.
The latter is the more complicated, but more flexible solution, while the former is the more straight forward and easier to implement solution.
The common base object could be the existing DataSet, if it were to be extended to include Dimensions and units. The DataView then would need to be amended to include those. The tricky bit would be the dimensions, which would need to include a view (offset+count) applied to the Dimension of the underlying DataArray. The Tags would need to be changed to work only with DataSets for references and retrieveData.
Another new object would be needed representing a view of a DataFrame, much like DataView for DataArray: FrameView (name subject to change), implementing a DataSet (i.e. a FrameView is DataSet). The reference in the file format would need to be amended (attributes in hdf5) to specify everything that is needed to re-create that FrameView.