tobac
tobac copied to clipboard
Minimally invasive workflow for xarray transition
this is a draft PR to document our collaborative activities in the xarray to iris transition. The idea for a step-by-step transition to xarray is laid out in #143 . This is also a test for full collaborative development, in contrast to the earlier strategy with a single feature prepared by one developer. Let's see ....
I suggest:
-
that the people that work on a certain module indicate what they currently do and also plan to do in a comment to this draft PR, maybe making a checkbox list of already migrated functions within that module
-
that everybody runs testing locally with the standard tests and the notebooks from the example directory using
tobac> python -m pytest tobac> jupyter nbconvert --to notebook --inplace --execute examples/Example_*/*.ipynb
Codecov Report
Patch coverage: 100.00
% and project coverage change: +1.49
:tada:
Comparison is base (
a0fd1ff
) 47.47% compared to head (7e9b328
) 48.96%.
Additional details and impacted files
@@ Coverage Diff @@
## RC_v1.5.0 #275 +/- ##
=============================================
+ Coverage 47.47% 48.96% +1.49%
=============================================
Files 14 14
Lines 2711 2851 +140
=============================================
+ Hits 1287 1396 +109
- Misses 1424 1455 +31
Flag | Coverage Δ | |
---|---|---|
unittests | 48.96% <100.00%> (+1.49%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
Impacted Files | Coverage Δ | |
---|---|---|
tobac/feature_detection.py | 83.16% <100.00%> (+0.16%) |
:arrow_up: |
tobac/testing.py | 96.51% <100.00%> (+0.57%) |
:arrow_up: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I started work on feature_detection.py
.
- [ ] feature_detection_multithreshold
- [x] feature_detection_multithreshold_timestep
- [x] feature_detection_threshold
- [x] feature_position
- [x] filter_min_distance
- [x] remove_parents
- [x] test_overlap
notes: def feature_detection_threshold
is using numpy
arrays as input type -> this is inconsistent with our interface strategy -> however, I keep it for the moment.
OK, tests are in place for function feature_detection_multithreshold
.
However, they fail because xarray
input is not convert into the correct iris.Cube
format. .coords
is empty:
if "time" not in [coord.name() for coord in field_in.coords()]:
> raise ValueError(
"input to feature detection step must include a dimension named 'time'"
)
E ValueError: input to feature detection step must include a dimension named 'time'
Two options:
- correct the respective wrapper in utils
- proceed and remove iris internals in
feature_detection_multithreshold
such that.coords()
is not used anymore
I favor option 2.
@freemansw1 , @w-k-jones , @JuliaKukulies : What do you think?
I think this is failing because testing.make_dataset_from_arr
doesn't actually set any coordinate names. I'm not even sure why it works in the first place, given that it initially creates an xarray.DataArray
and then converts it to iris
, but we should probably explicitly set coordinate names.
Overall, I think we need to create a robust xarray/iris conversion method before we convert any of the internals so that we can ensure that output/behaviour is identical before and after the switch. I've starting making a list of elements we need to add on top of the existing xarray
conversion methods, which includes:
- Coordinate/dim/variable name uniformity. I think that
xarray.to_iris
uses thestandard_name
attribute of variables when converting toiris
, which can be different to their name inxarray
. This isn't symmetric in the other direction either! - Handling times, as it's a mess in
iris
. Proleptic gregorian calendar nonsense etc.
We are not setting coordinate names in testing.making_dataset_from_arr
in the current version, but we are also not testing feature_detection_multithreshold
.
In RC_v1.5.0
, we are actually setting the coordinate name for the time dimension, but only if the datatype is set to iris
(see here):
if has_time:
out_arr_iris.add_dim_coord(
iris.coords.DimCoord(
pd.date_range(start=time_min, periods=time_num)
.values.astype("datetime64[s]")
.astype(int),
standard_name="time",
units="seconds since epoch",
),
time_dim_num,
)
The tests in the current version do not fail because we have only tested with iris cubes. So agree with @w-k-jones , that we should set the coordinate names when the test dataset output is set to be a xarray.DataArray
.
Also agree that it makes sense to start working on more robust conversion routines, and working with the decorators as suggested by @fsenf might also help to identify more of the oddities that are currently covered. Or were you suggesting to fix the conversion routines entirely before continuing with this strategy, @w-k-jones ?
@JuliaKukulies : maybe I misunderstood, but we actually have two testing function for feature_detection_multithreshold
starting in line
def test_feature_detection_multiple_z_coords(
)
def test_feature_detection_setting_multiple():
)
I extended the first test function to also test for xarray and iris as part of the transition to xarray. I will look at the test data soon to find a solution.
@fsenf, sorry I figured my comment was a bit unclear. I was just saying that these test functions have only been introduced with the 3D changes. Hence, they exist in the RC_v1.5.0
branch, but not in current main
. That is not relevant for the changes you made, but was just a note as to why testing.making_dataset_from_arr
is not yet equipped with setting the coordinate names for both iris and xarray.
The code snippet I posted was from RC_v1.5.0
and shows why the tests you mention currently work: for iris test data we are explicitly setting the time dimension and we are currently only testing iris. With your changes, we also test xarray which is not covered by the code. I think the easiest for now would be to set the coordinate names for xarray test data sets and then make sure the conversion routine is robust in converting these correctly. Your option 2 would already internally change the code which wanted to stay away for the moment or not?
@JuliaKukulies : Thanks for the hint! make_dataset_from_arr
is now adjusted to output xarray with coordinates properly!
Excellent, @fsenf !
FWIW, I now have a mostly working copy of feature detection fully transitioned to xarray from iris, using many of the same principles as discussed here, including a migration of add_coordinates
to xarray, which turned out to be a more difficult problem than I had imagined! I need to incorporate some of the tests here, but I wanted to pass it along. I'm happy to commit to the exp
branch or do a separate PR into either the exp
branch or into the RC_v1.5.x
branch.
I still have some more testing and validation to do, and some of the folks running tracking on large datasets at CSU have volunteered to do more testing on this. Once I finish my testing, I'll be happy to contribute it back.