ert icon indicating copy to clipboard operation
ert copied to clipboard

Transformation that converts design matrix into records

Open xjules opened this issue 4 years ago • 4 comments

When working on doe (design of experiment) it is currently quite cumbersome to read the parameters from the design matrix file (eg. by means of an external job) and then create parameter json representation (one by one) in order to load them later on explicitly as records. Therefore it would be meaningful to have a transformation that reads such a design matrix csv file and does the parameter records automatically.

xjules avatar Feb 04 '22 11:02 xjules

For reference, this is the corresponding code for ERT2: https://github.com/equinor/semeio/blob/main/semeio/jobs/design2params/design2params.py

berland avatar Feb 04 '22 11:02 berland

Transformation multiplexing/singleplexing

There's clearly a need for multiplexing transformations, i.e. transformations that create multiple records from one file, or vice-versa—or both. Transformations already do singleplexing with from_record and to_record. Multiplexing would introduce to_records and from_records.

This increases the complexity of the transformation API. So, to make this livable, we need very strict rules for how *plexing is dealt with.

E.g.

  • SerializationTransformation does not currently allow any multiplexing.
  • CopyTransformation is singleplexing only.
  • DesignMatrixTransformation allows multiplexing and singleplexing.
  • EclSumTransformation will only allow one-directional singleplexing, because it produces a "single" record tree from file to record only.

After configuration and creation on the DesignMatrixTransformation instance, it should be decided what kind of *plexing the instance will do. So some form of rule or heuristic would exist in the instance factory. The point is, consumers of the transformation shouldn't have to guess, and it should fail immediately if something is unclear/unsupported.

Further, I initially thought that the interface would be something like this:

async def to_record(self, root_path = Path()) -> Record:

async def to_records(self, root_path = Path()) -> RecordCollection:

but a significant usage of design matrices is to create only one group (which we can call design_matrix). This group is only part of the parameters that is to be defined—other parameters might come from a stochastic source.

So RecordCollection is a too generic, too dumb data model for this. A RecordCollection should be able to support

  • selecting a subset based on grouping (in the parameter dimension)
  • be built iteratively, meaning multiple sources can constitute a record

For DOE there's also not vectors, but scalars, so #2934 blocks this.

TBC…

jondequinor avatar Mar 16 '22 08:03 jondequinor

Closing as related to ert3, which is no longer the direction taken by the project. Feel free to reopen if still relevant.

eivindjahren avatar Sep 15 '22 13:09 eivindjahren

This one relates to code in the ert.data package and is still relevant

sondreso avatar Sep 16 '22 08:09 sondreso

Closing in favor of https://github.com/equinor/ert/issues/4656

sondreso avatar Feb 07 '23 09:02 sondreso