Transformation that converts design matrix into records
When working on doe (design of experiment) it is currently quite cumbersome to read the parameters from the design matrix file (eg. by means of an external job) and then create parameter json representation (one by one) in order to load them later on explicitly as records. Therefore it would be meaningful to have a transformation that reads such a design matrix csv file and does the parameter records automatically.
For reference, this is the corresponding code for ERT2: https://github.com/equinor/semeio/blob/main/semeio/jobs/design2params/design2params.py
Transformation multiplexing/singleplexing
There's clearly a need for multiplexing transformations, i.e. transformations that create multiple records from one file, or vice-versa—or both. Transformations already do singleplexing with from_record and to_record. Multiplexing would introduce to_records and from_records.
This increases the complexity of the transformation API. So, to make this livable, we need very strict rules for how *plexing is dealt with.
E.g.
-
SerializationTransformationdoes not currently allow any multiplexing. -
CopyTransformationis singleplexing only. -
DesignMatrixTransformationallows multiplexing and singleplexing. -
EclSumTransformationwill only allow one-directional singleplexing, because it produces a "single" record tree from file to record only.
After configuration and creation on the DesignMatrixTransformation instance, it should be decided what kind of *plexing the instance will do. So some form of rule or heuristic would exist in the instance factory. The point is, consumers of the transformation shouldn't have to guess, and it should fail immediately if something is unclear/unsupported.
Further, I initially thought that the interface would be something like this:
async def to_record(self, root_path = Path()) -> Record:
async def to_records(self, root_path = Path()) -> RecordCollection:
but a significant usage of design matrices is to create only one group (which we can call design_matrix). This group is only part of the parameters that is to be defined—other parameters might come from a stochastic source.
So RecordCollection is a too generic, too dumb data model for this. A RecordCollection should be able to support
- selecting a subset based on grouping (in the parameter dimension)
- be built iteratively, meaning multiple sources can constitute a record
For DOE there's also not vectors, but scalars, so #2934 blocks this.
TBC…
Closing as related to ert3, which is no longer the direction taken by the project. Feel free to reopen if still relevant.
This one relates to code in the ert.data package and is still relevant
Closing in favor of https://github.com/equinor/ert/issues/4656