raven icon indicating copy to clipboard operation
raven copied to clipboard

[UNDER-DISCUSSION] Format of a Realization

Open PaulTalbot-INL opened this issue 5 years ago • 2 comments


Under Discussion Topic

Particularly in a team discussion on June 12, 2020, we are considering how we work with data internally in RAVEN, specifically with regards to realizations.

Traditionally, realizations have been used internally as a dictionary, with some "magic" variables such as _indexMap to carry important data, and little distinction between sampled variables and meta variables.

Since the move to XArray-based data objects, we have the new possibility to directly use objects more conducive to data transfer than dictionaries, such as Xarray DataArrays or DataFrames.

Similarly, when constructed, realizations to be sampled are created by the Sampler as a dictionary.

We could consider a new entity, the Realization, that has the flexibility to have several different underlying data structures (such as numpy arrays, dictionaries, or XArray objects) but consistent getter and setter methods for data storing and retrieving.

These Realizations should ideally operate very seamlessly with the DataObject, reducing where possible the computational needs in gathering the data into a DataObject.

Further, the "meta" information previously stored as variables can instead be stored on the Realization member itself, without mixing it into the traditional variables.


For Change Control Board: Issue Review

This review should occur before any development is performed as a response to this issue.

  • [ ] 1. Is it tagged with the under_discussion type?
  • [ ] 2. If implemented, it will add a new requirement?
  • [ ] 3. Is a rationale provided? (Such as explaining why the improvement is needed )

For Change Control Board: Issue Closure

This review should occur when the issue is imminently going to be closed.

  • [ ] 1. The discussion determined the addition of a new task issue?

PaulTalbot-INL avatar Jun 12 '20 15:06 PaulTalbot-INL

A useful feature of this Realization would be the ability to transform itself into a single-sample xarray dataset, which may greatly simplify the DataObject's data merging, maybe even enough to get rid of the dataset._collector vs dataset._data structure.

For example:

    indexMap = rlz.get('_indexMap', [{}])[0]
    indices = list(set().union(*(set(x) for x in indexMap.values())))
    # verbose but slower
    xarrs = {}
    for var in rlz:
      if var == '_indexMap' or var in indices + ['SampledVars', 'SampledVarsPb', 'crowDist', 'SamplerType']:
        continue
      vals = rlz[var]
      dims = indexMap.get(var, [])
      if not dims and len(vals) == 1:
        vals = vals[0]
      coords = dict((idx, rlz[idx]) for idx in indexMap.get(var, []))
      xarrs[var] = xr.DataArray(vals, dims=dims, coords=coords).expand_dims(dim={'RAVEN_sample_ID': [counter]})
    rlzDS = xr.Dataset(xarrs)

PaulTalbot-INL avatar Mar 04 '21 16:03 PaulTalbot-INL

Further, info like the indexMap could be tracked without it being a realization variable

PaulTalbot-INL avatar Sep 30 '21 20:09 PaulTalbot-INL