tobac
tobac copied to clipboard
Switch feature detection to use xarray internally
This is a draft PR that contains my work switching feature detection to use xarray internally, using many of the concepts (and some of the code) from #275. This code uses entirely xarray and pandas for processing feature detection, and removes all iris calls; instead using our internal conversion utilities to convert input iris data to xarray (which should always work, unlike xarray
->iris
. Further, this change should allow users to provide their data as an xarray.DataArray
.
This PR will remain in draft form until at least #293 is merged and intentionally includes those changes.
More tests (particularly around times on different OS versions) should be added, but I wanted to get this into a form that we could discuss.
- [x] Have you followed our guidelines in CONTRIBUTING.md?
- [x] Have you self-reviewed your code and corrected any misspellings?
- [x] Have you written documentation that is easy to understand?
- [x] Have you written descriptive commit messages?
- [x] Have you added NumPy docstrings for newly added functions?
- [x] Have you formatted your code using black?
- [x] If you have introduced a new functionality, have you added adequate unit tests?
- [x] Have all tests passed in your local clone?
- [ ] If you have introduced a new functionality, have you added an example notebook?
- [ ] Have you kept your pull request small and limited so that it is easy to review?
- [x] Have the newest changes from this branch been merged?
Codecov Report
Attention: Patch coverage is 78.20513%
with 68 lines
in your changes missing coverage. Please review.
Project coverage is 62.27%. Comparing base (
5f82097
) to head (bbb6b67
).
Files | Patch % | Lines |
---|---|---|
tobac/utils/internal/iris_utils.py | 63.63% | 52 Missing :warning: |
tobac/utils/internal/xarray_utils.py | 88.88% | 14 Missing :warning: |
tobac/utils/general.py | 85.71% | 2 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## RC_v1.6.0 #354 +/- ##
=============================================
+ Coverage 60.91% 62.27% +1.36%
=============================================
Files 23 23
Lines 3541 3690 +149
=============================================
+ Hits 2157 2298 +141
- Misses 1384 1392 +8
Flag | Coverage Δ | |
---|---|---|
unittests | 62.27% <78.20%> (+1.36%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Okay, this is now ready for review in my opinion. This PR contains changes from #380.
I'd like feedback (and testing) everywhere, but in particular, I'd like feedback on the new add_coordinates_to_features
function in tobac/utils/internal/xarray_utils.py
. I had to come up with a clever solution, and I really hate clever solutions.
Okay, this is now ready for review in my opinion. This PR contains changes from #380.
I'd like feedback (and testing) everywhere, but in particular, I'd like feedback on the new
add_coordinates_to_features
function intobac/utils/internal/xarray_utils.py
. I had to come up with a clever solution, and I really hate clever solutions.
Aha, multidimensional interpolation with xarray is a massive pain...
Okay, this is now ready for review in my opinion. This PR contains changes from #380. I'd like feedback (and testing) everywhere, but in particular, I'd like feedback on the new
add_coordinates_to_features
function intobac/utils/internal/xarray_utils.py
. I had to come up with a clever solution, and I really hate clever solutions.Aha, multidimensional interpolation with xarray is a massive pain...
The issue actually came about because we have i,j,k
coordinates, and there doesn't appear to be a way to interpolate along those coordinates if a coord and dim share a name. The multidimensional aspect actually wasn't too bad as it only has to interpolate a single point at a time.
Okay, this is now ready for review in my opinion. This PR contains changes from #380. I'd like feedback (and testing) everywhere, but in particular, I'd like feedback on the new
add_coordinates_to_features
function intobac/utils/internal/xarray_utils.py
. I had to come up with a clever solution, and I really hate clever solutions.Aha, multidimensional interpolation with xarray is a massive pain...
The issue actually came about because we have
i,j,k
coordinates, and there doesn't appear to be a way to interpolate along those coordinates if a coord and dim share a name. The multidimensional aspect actually wasn't too bad as it only has to interpolate a single point at a time.
Ah I see what you mean, it's because of interpolating based on array index location rather than coord value
Could this be solved using DataArray.swap_dims?:
variable_da = variable_da.swap_dims({"time":"frame", "south_north":"hdim_1", "west_east":"hdim_2"})
variable_da.interp(
coords=dict(
frame=xr.DataArray(Features["frame"].values, dims="feature"),
hdim_1=xr.DataArray(Features["hdim_1"].values, dims="feature"),
hdim_2=xr.DataArray(Features["hdim_2"].values, dims="feature"),
)
)
Suggestion was found via this still open xarray issue from 4 years ago... https://github.com/pydata/xarray/issues/3343
Could this be solved using DataArray.swap_dims?:
variable_da = variable_da.swap_dims({"time":"frame", "south_north":"hdim_1", "west_east":"hdim_2"}) variable_da.interp( coords=dict( frame=xr.DataArray(Features["frame"].values, dims="feature"), hdim_1=xr.DataArray(Features["hdim_1"].values, dims="feature"), hdim_2=xr.DataArray(Features["hdim_2"].values, dims="feature"), ) )
Suggestion was found via this still open xarray issue from 4 years ago... pydata/xarray#3343
Yes, good catch. I hadn't seen that; I came up with the same pathway, but was doing a DataArray
->Dataset
conversion first because I thought it would be named something sensible like rename_dims
. I've revised this now.
Linting results by Pylint:
Your code has been rated at 8.70/10 (previous run: 8.70/10, +0.00) The linting score is an indicator that reflects how well your code version follows Pylint’s coding standards and quality metrics with respect to the RC_v1.6.0 branch. A decrease usually indicates your new code does not fully meet style guidelines or has potential errors.
I have started my review on this, along with #380 , but I think it would be good to discuss whether we want to do refactoring during the switch to xarray or afterwards. In particular I think we need to make sure xarray_utils
is in the best state possible as it will be important for refactoring segmentation and tracking, but I'm not sure how much the current implementation here is limited by needing to match iris_utils
capability.
I'm also not sure that conversion of datetime etc inside feature detection is the best way to go. In the current implementation the converted datetime to cftime doesn't match to iris input, as it doesn't have information about the calendar. I think it may be easier and clearer to do the conversion in the decorator, and keep the function unaware of whether the input data is being converted or not, but there may be cases were the other way around is easier. Will be good to discuss this tomorrow!
Note to self: this depends on #380 to merge.
Got about halfway through the review comments... Next few will take a bit longer.
OK, I have resolved all review comments here. @w-k-jones ready for a second review. We will need another review from someone else, too, but this is not urgent as I need to get #380 resolved first.
Great, thanks for making the revisions. I've had a look through the changes and tested the code with both xarray and iris locally and have no issues to issues. Should we start an
RC_v1.6.0
branch to merge this into?
Yes, we will need to do that. I'll wait until we have to though to minimize the amount of time we have to maintain separate branches. We still need one more reviewer, @JuliaKukulies or @kelcyno are you up for it?
Great, thanks for making the revisions. I've had a look through the changes and tested the code with both xarray and iris locally and have no issues to issues. Should we start an
RC_v1.6.0
branch to merge this into?Yes, we will need to do that. I'll wait until we have to though to minimize the amount of time we have to maintain separate branches. We still need one more reviewer, @JuliaKukulies or @kelcyno are you up for it?
I can do the second review within the next couple of days!
I also started a RC_v1.6.0
branch to merge this PR into. We can wait until the release of v1.5.3
to update the branch before this PR is merged. But I will try to review asap :)
Ok, @JuliaKukulies, I think I've addressed your concerns. Ready for your re-review. Hopefully we can get this merged!
Excellent, @freemansw1! This looks good except for that there is still the issue with preserve_iris_datetime_type
which does not result in the same datetime format as when I provide iris input when this parameters is True
(see my last comment).
Sorry I had not marked the bug at the actual code locations, so I get that you might have overseen this. I also cannot really figure out where this bug comes from, since everywhere in the code where preserve_iris_datetime_type
is used looks like it should work. Could you reproduce this or is this an issue with my test data?
Excellent, @freemansw1! This looks good except for that there is still the issue with
preserve_iris_datetime_type
which does not result in the same datetime format as when I provide iris input when this parameters isTrue
(see my last comment).Sorry I had not marked the bug at the actual code locations, so I get that you might have overseen this. I also cannot really figure out where this bug comes from, since everywhere in the code where
preserve_iris_datetime_type
is used looks like it should work. Could you reproduce this or is this an issue with my test data?
Apologies, yeah, I had missed this. I will work to reproduce and try this out this weekend/early next week.
I think I've noticed this before, and the issue is to do with calculating the timesteps from cftime
format. I think that in the current versions of tobac the time is also returned with an object dtype, but I would be happier to change it to datetime64 in all cases as I currently have to do this for most use cases...
Note to self: add_coords
for xarray fails when having coordinates to interpolate along dimensions other than x/y.
OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically, xarray
only converts to datetime64
if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).
Note to self:
add_coords
for xarray fails when having coordinates to interpolate along dimensions other than x/y.
I have now also resolved this issue. The solution here results in some decisions that were contrary to the discussion we had at the last dev meeting (notably that len(1) coordinates are still appended to the dataframe), but I think we are otherwise okay. We can talk about this at the dev meeting tomorrow.
@w-k-jones and @JuliaKukulies I am now ready for a re-review.
OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically,
xarray
only converts todatetime64
if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).
OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:
-
for xarray input and
preserve_iris_datetime_type = True
->datetime64
-
for xarray input and
preserve_iris_datetime_type = False
->datetime64
-
for iris input and
preserve_iris_datetime_type = True
->object (cftime.DatetimeGregorian)
-
for iris input and
preserve_iris_datetime_type = False
->object (cftime.DatetimeGregorian)
Is that what we want?
OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically,
xarray
only converts todatetime64
if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:
- for xarray input and
preserve_iris_datetime_type = True
->datetime64
- for xarray input and
preserve_iris_datetime_type = False
->datetime64
- for iris input and
preserve_iris_datetime_type = True
->object (cftime.DatetimeGregorian)
- for iris input and
preserve_iris_datetime_type = False
->object (cftime.DatetimeGregorian)
Is that what we want?
Ok, the first of those three are all correct, but the last one isn't.
OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically,
xarray
only converts todatetime64
if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:
- for xarray input and
preserve_iris_datetime_type = True
->datetime64
- for xarray input and
preserve_iris_datetime_type = False
->datetime64
- for iris input and
preserve_iris_datetime_type = True
->object (cftime.DatetimeGregorian)
- for iris input and
preserve_iris_datetime_type = False
->object (cftime.DatetimeGregorian)
Is that what we want?
Ok, the first of those three are all correct, but the last one isn't.
Thats what I thought, and we would expect datetime64
for the last one, right? Have you tested this with the data I sent you or another dataset? Sorry to cause more struggle..
OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically,
xarray
only converts todatetime64
if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:
- for xarray input and
preserve_iris_datetime_type = True
->datetime64
- for xarray input and
preserve_iris_datetime_type = False
->datetime64
- for iris input and
preserve_iris_datetime_type = True
->object (cftime.DatetimeGregorian)
- for iris input and
preserve_iris_datetime_type = False
->object (cftime.DatetimeGregorian)
Is that what we want?
Ok, the first of those three are all correct, but the last one isn't.
Thats what I thought, and we would expect
datetime64
for the last one, right? Have you tested this with the data I sent you or another dataset? Sorry to cause more struggle..
Ok this one was easy- we didn't actually expect the kwarg in feature_detection_multithreshold
. I've fixed that.
OK, I think I have resolved the datetime issues. It's somewhat of a hacky solution, but basically,
xarray
only converts todatetime64
if the time received is in Gregorian time. I now check for this and convert back to Gregorian time. For all other calendars (thanks @harrietgilmour for helping with this part), xarray leaves it as an object (which does still break our code, but alas).OK, I have tested this again with my data and I thought I understood what you explained in the dev meeting today, but now i am a little confused again :) I basically get:
- for xarray input and
preserve_iris_datetime_type = True
->datetime64
- for xarray input and
preserve_iris_datetime_type = False
->datetime64
- for iris input and
preserve_iris_datetime_type = True
->object (cftime.DatetimeGregorian)
- for iris input and
preserve_iris_datetime_type = False
->object (cftime.DatetimeGregorian)
Is that what we want?
Ok, the first of those three are all correct, but the last one isn't.
Thats what I thought, and we would expect
datetime64
for the last one, right? Have you tested this with the data I sent you or another dataset? Sorry to cause more struggle..Ok this one was easy- we didn't actually expect the kwarg in
feature_detection_multithreshold
. I've fixed that.
Perfect, thanks for looking into this and fixing this. I am happy to approve this now once the latest changes from RC_v1.5.x are merged in
OK, changes all merged in and everything. @JuliaKukulies and @w-k-jones are you happy for me to merge into RC_v1.6.0
?
OK, changes all merged in and everything. @JuliaKukulies and @w-k-jones are you happy for me to merge into
RC_v1.6.0
?
Yes, please go ahead:) !
Also happy for this to be merged :)