Revised EnsembleSummaryProvider interface
Refined EnsembleSummaryProvider interface to support finer control of which range of dates get returned when using lazy resampling and requesting data for multiple realizations.
Previously, when using lazy resampling, and requesting vector data for multiple realizations through EnsembleSummaryProvider.get_vectors_df(), each returned realization would contain the smalles range of dates possible that would still cover the raw data's dates.. This PR makes it possible to force the date range of all the returned realizations to be the same - either by truncating or extending the range of dates so that all realizations have an equal date range and contain the same dates.
The new behavior can be controlled through the common_date_span property of the ResamplingOptions options object that is now accepted by EnsembleSummaryProvider.get_vectors_df(). Please see documentation for ResamplingOptions for more details.
The EnsembleSummaryProvider.dates() function has also been extended with an additional argument, date_span, that controls wheter the union or intersection of dates over realizations is returned.
Note that this is a breaking change in the EnsembleSummaryProvider interface that affects the following member functions:
EnsembleSummaryProvider.dates()EnsembleSummaryProvider.get_vectors_df()
As an example of code changes needed in the client code, consider the case where we are requesting lazy resampling of two vectors, for all realizations, with a MONTHLY resampling frequency. Currently this would be accomplished like this:
vecdf = myprovider.get_vectors_df(
vector_names=["VECA", "VECB"],
resampling_frequency=Frequency.MONTHLY,
realizations=None,
)
With the revised interface, the client code would have to be changed to something like this:
vecdf = myprovider.get_vectors_df(
vector_names=["VECA", "VECB"],
resampling_options=ResamplingOptions(frequency=Frequency.MONTHLY, common_date_span=None),
realizations=None,
)
Contributor checklist
- [x] :robot: I have added tests, or extended existing tests, to cover any new features or bugs fixed in this PR.
- [x] :book: I have considered adding a new entry in
CHANGELOG.md, and added it if should be communicated there.
When any kind of statistics is calculated, we need to either take union or intersection across the dates (to get a common date axis). For vectors that are of type cumulative we can take the union (since these have a well defined extrapolation). For all other types of vectors we take the intersection (since these are unsafe to extrapolate - we could potentially also extrapolate rate vectors by assuming they are zero outside data span, but I'm not sure if we today have the metadata necessary to differentiate between rate vectors and e.g. pressure vectors? @asnyv?).
@anders-kiaer We don't have the metadata, but I think maybe a varying date span between statistics of different vectors could be a bit confusing for users, and that we should consider to go for the union for all vectors?