oceanspy
oceanspy copied to clipboard
plot arrays across faces without cutout
This issue closes #410 and #409
The following are tasks that need to be completed before merging this PR. Until then, consider this as a work in progress
- [x] Add plot function to code base that works.
- [x] Include some tests.
- [x] Update API / documentation that clearly demonstrate how it works.
- [x] Include a link to an example notebook
I will work on completing the rest of the tasks sometime this week, or the weekend at the latest.
Codecov Report
Attention: Patch coverage is 68.36158%
with 56 lines
in your changes are missing coverage. Please review.
Project coverage is 95.62%. Comparing base (
9e566fc
) to head (ed9f249
). Report is 5 commits behind head on main.
Files | Patch % | Lines |
---|---|---|
oceanspy/plot.py | 68.36% | 36 Missing and 20 partials :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #427 +/- ##
==========================================
- Coverage 96.59% 95.62% -0.97%
==========================================
Files 9 9
Lines 4987 5163 +176
Branches 1190 1246 +56
==========================================
+ Hits 4817 4937 +120
- Misses 97 133 +36
- Partials 73 93 +20
Flag | Coverage Δ | |
---|---|---|
unittests | 95.62% <68.36%> (-0.97%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hey Tom. catalog_ECCO.yml
it is just a catalog on my local computer that I use for development since it points towards the test data from oceanspy. I think it might be better to run the notebook on sciserver. I will make the proper modifications to do that (3 cells)
@ThomasHaine I changed a bit the notebook in the hyper link above to run in Sciserver, it also has description on how to install the correct oceanspy version (this branch) in a sciserver environment, and also added improved description of the notebook. There are about 20 examples.
The notebook successfully ran in Sciserver with ECCO. I'd expect the same with LLC4320.
@Mikejmnez thanks. I'll check...
With LLC4320 it errors at the first plot with:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File <timed exec>:8
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1474, in _plotMethods.faces_array(self, **kwargs)
1472 @_functools.wraps(faces_array)
1473 def faces_array(self, **kwargs):
-> 1474 return faces_array(self._od, **kwargs)
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1127, in faces_array(od, Ymoor, Xmoor, varName, xoak_index, face2axis, **kwargs)
1121 if xoak_index not in _xoak.IndexRegistry():
1122 raise ValueError(
1123 "`xoak_index` [{}] is not supported."
1124 "\nAvailable options: {}"
1125 "".format(xoak_index, _xoak.IndexRegistry())
1126 )
-> 1127 ds_grid.xoak.set_index(["XC", "YC"], xoak_index)
1128 cdata = {"XC": ("mooring", Xmoor), "YC": ("mooring", Ymoor)}
1129 ds_data = _xr.Dataset(cdata) # mooring data
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in XoakAccessor.set_index(self, coords, index_type, persist, **kwargs)
98 self._index_type = index_type
99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
103 if len(set([c.dims for c in coord_objs])) > 1:
104 raise ValueError(
105 'Coordinates {coords} must all have the same dimensions in the same order'
106 )
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in <listcomp>(.0)
98 self._index_type = index_type
99 self._index_coords = tuple(coords)
--> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords]
103 if len(set([c.dims for c in coord_objs])) > 1:
104 raise ValueError(
105 'Coordinates {coords} must all have the same dimensions in the same order'
106 )
File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/coordinates.py:289, in DatasetCoordinates.__getitem__(self, key)
287 def __getitem__(self, key: Hashable) -> DataArray:
288 if key in self._data.data_vars:
--> 289 raise KeyError(key)
290 return self._data[key]
KeyError: 'XC'
@Mikejmnez any ideas about how to fix?
With LLC4320 it errors at the first plot with:
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File <timed exec>:8 File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1474, in _plotMethods.faces_array(self, **kwargs) 1472 @_functools.wraps(faces_array) 1473 def faces_array(self, **kwargs): -> 1474 return faces_array(self._od, **kwargs) File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/oceanspy/plot.py:1127, in faces_array(od, Ymoor, Xmoor, varName, xoak_index, face2axis, **kwargs) 1121 if xoak_index not in _xoak.IndexRegistry(): 1122 raise ValueError( 1123 "`xoak_index` [{}] is not supported." 1124 "\nAvailable options: {}" 1125 "".format(xoak_index, _xoak.IndexRegistry()) 1126 ) -> 1127 ds_grid.xoak.set_index(["XC", "YC"], xoak_index) 1128 cdata = {"XC": ("mooring", Xmoor), "YC": ("mooring", Ymoor)} 1129 ds_data = _xr.Dataset(cdata) # mooring data File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in XoakAccessor.set_index(self, coords, index_type, persist, **kwargs) 98 self._index_type = index_type 99 self._index_coords = tuple(coords) --> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords] 103 if len(set([c.dims for c in coord_objs])) > 1: 104 raise ValueError( 105 'Coordinates {coords} must all have the same dimensions in the same order' 106 ) File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xoak/accessor.py:101, in <listcomp>(.0) 98 self._index_type = index_type 99 self._index_coords = tuple(coords) --> 101 coord_objs = [self._xarray_obj.coords[cn] for cn in coords] 103 if len(set([c.dims for c in coord_objs])) > 1: 104 raise ValueError( 105 'Coordinates {coords} must all have the same dimensions in the same order' 106 ) File ~/mambaforge/envs/Oceanography/lib/python3.10/site-packages/xarray/core/coordinates.py:289, in DatasetCoordinates.__getitem__(self, key) 287 def __getitem__(self, key: Hashable) -> DataArray: 288 if key in self._data.data_vars: --> 289 raise KeyError(key) 290 return self._data[key] KeyError: 'XC'
Hey @ThomasHaine! Apologies, I thought I wrote but my message stayed as draft.
The issue with LLC4320 is a non-issue really. The Keyerror means that the variable XC
(and YC
) are not defined as coordinates for the xarray.dataset. This is fixed by setting them as coordinates, for example:
ds = ds.set_coords(['XC', 'YC'])
that does it. Otherwise xoak
can't find them.
However, last week as I ran this code with LLC4320 on SciServer (Oceanography (Integrated Viewer)) and found that:
-
it remains pretty slow. Most of the time is on building the tree with
xoak
. I tried loading the variablesXC, YC
into memory (since these are 2D) but it didn't work. It has to do with building the tree everytime there is a query for coordinates. I need to work on perhaps reusing the build of the tree (viaxoak
) so that we don't continuously re-build it, which seems unnecessary. Perhaps once Grendel is ready things will be faster and it will be a non-issue. -
The plotting function has some bug or something. When I use the same coordinates from the ECCO examples but now on LLC4320, the plots are off, which is weird since both datasets have the same topology.
I need to spend some time on it. I am not too concern about 2), but 1) is a bummer (that is slow with LLC4320). Again, perhaps with Grendel things will run much much faster. Right now I am using the basic Oceanography (Integrated Viewer) volume on Sciserver (so pretty basic).
Thanks @Mikejmnez. I find the same issues (1) and (2). Grendel is still available, so it may be possible to test now. Otherwise, the data transfer to the new ceph cluster will happen soon (I hope).
Is it possible to build the xoak
tree once and store it?
Is it possible to build the
xoak
tree once and store it?
Not sure, but if it is that would significantly speed up the plotting in this PR, subsample.mooring_array
and subsample.stations
!! I need to spend some time exploring these options, and finding the bug on the plotting func in this PR.
2. The plotting function has some bug or something. When I use the same coordinates from the ECCO examples but now on LLC4320, the plots are off, which is weird since both datasets have the same topology.
Fixed this problem with 2) above. It was minor issue and now plotting works as expected with LLC4320, DYAMOND and all other LLC-geometries. You can find some examples with LLC4320 (and how long it took to run) on this notebook.
note
-
I had to download LLC4320 and DYAMOND data to my local computer in zarr format, to speed things up. The 2D Grid and single snapshot of ETA. And so the notebooks run locally to the data, which is as fast as it will get on a single (8core) computer.
-
I did some performance test of xoak on my local computer. Not bad: xoak builds in about
20
secs with LLC430 (4
secs with DYAMOND) and so things are pretty fast when data is local to the compute. After than, selecting range of coordinates is pretty fast. I didn't test if storing the grid after xoak was built bypasses the need to build xoak within the grid dataset everytime. Will check that soon. -
I need to do some more benchmark: preliminary data on this notebook.
-
Another thing that slows down things on LLC4320 is the amount of points to be plotted.
4320^2
per face, and when there are multiple faces involved, many more grid points. And so a lot of the time spent by the function was simply plotting (up to ~1 minute). And that is when data is local to compute notebook. On Sciserver (at least the interactive node) I would expect things to be much slower. But perhaps with Grendel (optimize transfer of data) things will be much faster.
TODO: Still need to write tests and documentation (docstring
).
This PR is ready for merger!
Thanks @Mikejmnez . I'll take a look and review, then you can merge.
Thanks @Mikejmnez . I'll take a look and review, then you can merge.
Great. I updated the example notebook at the top of this PR conversation so you can see examples with LLC4320. You can see the timings there when the grid dataset is stored in zarr in my machine, next to the notebook.
The ds_grid
+ xoak
combination is being build everytime there is a plot, and so those timings can probably be improved a bit. The notebook can be considered the baseline when data is stored locally next to compute environment.
@ThomasHaine BTW you should re-install oceanspy
from the branch, since the code has changed since last time you installed it. The instructions to do that are in the example notebook that is referenced at the beginning of the PR conversation.
NOTE: The notebook uses a yaml
file that is also available from the GH repository, but that yaml
file defines the dataset locally to the notebook. To run that notebook from within Sciserver, you should instead use oceanspy's default catalog:
cell [4]: ECCOod = ospy.open_oceandataset.from_catalog("ECCO")
and
cell[6]: LLC4320od = ospy.open_oceandataset.from_catalog("LLC4320")
@Mikejmnez I checked the notebook with ECCO
and LLC4320
on SciServer, and all works fine. Please proceed with the merge!