uxarray icon indicating copy to clipboard operation
uxarray copied to clipboard

Non-Conservative Zonal Mean

Open philipc2 opened this issue 1 year ago • 117 comments
trafficstars

Closes #93

Overview

  • Implement Non-Conservative Zonal Mean.
  • Bug fixes for _get_zonal_faces_weight_at_constLat algorithm
  • Bug fixes for gca_constlat_intersection algorithm
  • Bug fixes for _pole_point_inside_polygon algorithm

Expected Usage

import uxarray as ux

grid_path = "/path/to/grid.nc"
data_path = "/path/to/data.nc"

uxds = ux.open_dataset(grid_path, data_path)

# compute the zonal average for the default start_lat_deg=-90, end_lat_deg=90, step_deg=5
zonal_result = uxds['data_var'].zonal_mean()

# Compute the zonal mean for a single latitude:
zonal_result = uxds["t2m"].zonal_mean(30.0)

PR Checklist

General

  • [x] An issue is linked created and linked
  • [x] Add appropriate labels
  • [x] Filled out Overview and Expected Usage (if applicable) sections

Testing

  • [x] Adequate tests are created if there is new functionality
  • [x] Tests cover all possible logical paths in your function
  • [x] Tests are not too basic (such as simply calling a function and nothing else)

Documentation

  • [x] Docstrings have been added to all new functions
  • [x] Docstrings have updated with any function changes
  • [x] Internal functions have a preceding underscore (_) and have been added to docs/internal_api/index.rst
  • [x] User functions have been added to docs/user_api/index.rst

Examples

  • [x] Any new notebook examples added to docs/examples/ folder
  • [x] Clear the output of all cells before committing
  • [x] New notebook files added to docs/examples.rst toctree
  • [ ] New notebook files added to new entry in docs/gallery.yml with appropriate thumbnail photo in docs/_static/thumbnails/

philipc2 avatar May 10 '24 18:05 philipc2

Hi @hongyuchen1030

I've set up the initial boilerplate for your implementation. If you'd like to discuss anything about the implementation before getting started, we can chat about it here.

One question though: will this implementation be the conservative or non-conservative one?

philipc2 avatar May 10 '24 18:05 philipc2

Hi @hongyuchen1030

I've set up the initial boilerplate for your implementation. If you'd like to discuss anything about the implementation before getting started, we can chat about it here.

One question though: will this implementation be the conservative or non-conservative one?

Thank you very much for your great help! I plan to implement the non-conservative one.

And can you also fill up the expected usage for me please? so that I have a better idea to start? Thanks

hongyuchen1030 avatar May 10 '24 22:05 hongyuchen1030

Hi @hongyuchen1030 I've set up the initial boilerplate for your implementation. If you'd like to discuss anything about the implementation before getting started, we can chat about it here. One question though: will this implementation be the conservative or non-conservative one?

Thank you very much for your great help! I plan to implement the non-conservative one.

And can you also fill up the expected usage for me please? so that I have a better idea to start? Thanks

Should be good now.

I wanted to ask and see how you think this type of functionality will work on data variables mapped to different elements. For example, we want to support data mapped to:

  • Nodes
  • Edges
  • Faces

Does your algorithm work on all these of these?

philipc2 avatar May 13 '24 19:05 philipc2

Hi @hongyuchen1030 I've set up the initial boilerplate for your implementation. If you'd like to discuss anything about the implementation before getting started, we can chat about it here. One question though: will this implementation be the conservative or non-conservative one?

Thank you very much for your great help! I plan to implement the non-conservative one. And can you also fill up the expected usage for me please? so that I have a better idea to start? Thanks

Should be good now.

I wanted to ask and see how you think this type of functionality will work on data variables mapped to different elements. For example, we want to support data mapped to:

  • Nodes
  • Edges
  • Faces

Does your algorithm work on all these of these?

A non-conservative zonal average is an average value on a constant latitude. So I will say it might be mapped to the faces on that latitude.

In your example function call, where is the place that users input latitude for query? And how do users know it's a non-conservative zonal average?

Such query should looks like res = grid.nc_zonal_mean(60, "Temperature")#60 degree for temperature mean

hongyuchen1030 avatar May 13 '24 19:05 hongyuchen1030

I may have worded the first part of my question incorrectly. I was asking whether your zonal average algorithm can compute the zonal average of data variables that either node, edge, or face centered. Typically the data variables are mapped to the faces, but less commonly data can be stored on each node or edge. For this application thought, I think it should only apply to edge or face centered data, since computing the zonal average of data on each node doesn't make much spatial sense.

For the second part, I can update it to reflect that (I put a very bare-bone outline)

Below is a zonal-average figure, with the righthand plot being the zonal average across the latitudes.

image

I think our UxDataArray.zonal_average()should compute the zonal average across a discrete range of lattitudes, something like -90 to 90 with a step size of 5 for example. This would generate a plot like the one above.

The example you gave would be computing the zonal average at a specific latitude. This should be a helper function under uxarray.core.zonal and can be called:

# helper function
_zonal_average_at_lattitude(data: np.ndarray, lat: float, conservative: bool = False)

This would contain the bulk of the artihmatic.

This would allow the one implemented at for the UxDataArray to call the helper depending on the user's parameters

# default, returns the zonal average for latitudes in the range [-90, 90] with a step size of 5
uxds['t2m'].zonal_average()

# specify range and step size
uxds['t2m'].zonal_average(lat = (-45, 45, 5))

# compute the zonal average for a single longitude
uxds['t2m'].zonal_average(lat = 45)


# parameter for conservative
uxds['t2m'].zonal_average(conservative=False)

Just a thought, we may want some Grid helper functions for getting the indices of the edges or faces that intersect a latitude line.

philipc2 avatar May 13 '24 19:05 philipc2

Here's a reference from NCL's zonal average page (which is where I got the figure above from) https://www.ncl.ucar.edu/Applications/zonal.shtml

philipc2 avatar May 13 '24 19:05 philipc2

I may have worded the first part of my question incorrectly. I was asking whether your zonal average algorithm can compute the zonal average of data variables that either node, edge, or face centered. Typically the data variables are mapped to the faces, but less commonly data can be stored on each node or edge. For this application thought, I think it should only apply to edge or face centered data, since computing the zonal average of data on each node doesn't make much spatial sense.

For the second part, I can update it to reflect that (I put a very bare-bone outline)

Below is a zonal-average figure, with the righthand plot being the zonal average across the latitudes.

image

I think our UxDataArray.zonal_average()should compute the zonal average across a discrete range of lattitudes, something like -90 to 90 with a step size of 5 for example. This would generate a plot like the one above.

The example you gave would be computing the zonal average at a specific latitude. This should be a helper function under uxarray.core.zonal and can be called:

# helper function
_zonal_average_at_lattitude(lat: float, conservative: bool = False)

This would contain the bulk of the artihmatic.

This would allow the one implemented at for the UxDataArray to call the helper depending on the user's parameters

# default, returns the zonal average for latitudes in the range [-90, 90] with a step size of 5
uxds['t2m'].zonal_average()

# specify range and step size
uxds['t2m'].zonal_average(lat = (-45, 45, 5))

# compute the zonal average for a single longitude
uxds['t2m'].zonal_average(lat = 45)


# parameter for conservative
uxds['t2m'].zonal_average(conservative=False)

Just a thought, we may want some Grid helper functions for getting the indices of the edges or faces that intersect a latitude line.

Thank you for your information. I was told that this function will be averaging the data for each face. So I only implemented for the scenario where: a data is attached to each face.

So I expected this current branch to be for a single zonal_average only, the helper function you mentioned. Once I finished this helper function, you can feel free to implement the discrete values function by calling my function.

hongyuchen1030 avatar May 13 '24 19:05 hongyuchen1030

Thank you for your information. I was told that this function will be averaging the data for each face. So I only implemented for the scenario where: a data is attached to each face.

Got it! Face-centered data is by far the most common scenario anyone will run into, so no worries.

So I expected this current branch to be for a single zonal_average only, the helper function you mentioned. Once I finished this helper function, you can feel free to implement the discrete values function by calling my function.

Sounds good, I can handle the UxDataArray implementation in this PR once you finish up the helper function.

I'll get the boilerplate updated with what we just discussed.

philipc2 avatar May 13 '24 19:05 philipc2

Thank you for your information. I was told that this function will be averaging the data for each face. So I only implemented for the scenario where: a data is attached to each face.

Got it! Face-centered data is by far the most common scenario anyone will run into, so no worries.

So I expected this current branch to be for a single zonal_average only, the helper function you mentioned. Once I finished this helper function, you can feel free to implement the discrete values function by calling my function.

Sounds good, I can handle the UxDataArray implementation in this PR once you finish up the helper function.

I'll get the boilerplate updated with what we just discussed.

Great! Thank you very much for your help!

hongyuchen1030 avatar May 13 '24 19:05 hongyuchen1030

Thank you for your information. I was told that this function will be averaging the data for each face. So I only implemented for the scenario where: a data is attached to each face.

Got it! Face-centered data is by far the most common scenario anyone will run into, so no worries.

So I expected this current branch to be for a single zonal_average only, the helper function you mentioned. Once I finished this helper function, you can feel free to implement the discrete values function by calling my function.

Sounds good, I can handle the UxDataArray implementation in this PR once you finish up the helper function. I'll get the boilerplate updated with what we just discussed.

Great! Thank you very much for your help!

Always happy to help!

philipc2 avatar May 13 '24 19:05 philipc2

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@hongyuchen1030

As part of this PR, I'll be putting together a user-guide section too.

philipc2 avatar May 16 '24 18:05 philipc2

Hi @philipc2,

I have a question regarding the expected usage of the zonal_mean in the PR description:

zonal_mean is a member function of UxDataArray class,

However, zonal_mean is accessed as UxDataset.zonal_mean in the PR description.

I checked api.py and noticed that it does not have a function similar to open_dataset that returns the UxDataArray class.

Could you please clarify if UxDataArray and UxDataset are both expected to be returned by open_dataset function in the future?

Thank you! --Amber

amberchen122 avatar May 17 '24 01:05 amberchen122

Hi @philipc2,

I have a question regarding the expected usage of the zonal_mean in the PR description:

zonal_mean is a member function of UxDataArray class,

However, zonal_mean is accessed as UxDataset.zonal_mean in the PR description.

I checked api.py and noticed that it does not have a function similar to open_dataset that returns the UxDataArray class.

Could you please clarify if UxDataArray and UxDataset are both expected to be returned by open_dataset function in the future?

Thank you! --Amber

Hi @amberchen122

uxds is an instance of a UxDataset, however once we access the data variable of interested, in our case uxds['data_var'], we are now working with a UxDataArray`.

You can think of a UxDataset as a collection of UxDataArrays

I'd suggest giving this section in our user guide a read, it should hopefully clarify it!

Let me know if you have any other questions!

philipc2 avatar May 17 '24 02:05 philipc2

Hi team,

  • I have completed the implementation of the zonal-mean function and its helper functions in zonal.py. I am now proceeding with testing and documentation. @philipc2, could you please provide any general guidelines for testing and documentation that I should follow?

  • I have assumed that the zonal-mean function handles only face-centered data. Could you confirm if this is correct? @philipc2 @hongyuchen1030 Link to code

  • When a constLat does not intersect any face, I have returned INT_FILL_VALUE as a placeholder value. If there is a preferred placeholder value, please let me know. Link to code

  • Additionally, regarding the zonal-mean API, should we allow users to specify the latitude range for which they want to compute the zonal mean?

Thank you for your help! :) --Amber

amberchen122 avatar May 21 '24 08:05 amberchen122

I attempted to run this on one of our sample meshes, but received an error.

base_path = "../../test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"
uxds = ux.open_dataset(grid_path, data_path)
uxds['psi'].zonal_mean()

ValueError: No intersections are found for the face, please make sure the build_latlon_box generates the correct results

philipc2 avatar May 21 '24 14:05 philipc2

Hi team,

  • I have completed the implementation of the zonal-mean function and its helper functions in zonal.py. I am now proceeding with testing and documentation. @philipc2, could you please provide any general guidelines for testing and documentation that I should follow?
  • I have assumed that the zonal-mean function handles only face-centered data. Could you confirm if this is correct? @philipc2 @hongyuchen1030 Link to code
  • When a constLat does not intersect any face, I have returned INT_FILL_VALUE as a placeholder value. If there is a preferred placeholder value, please let me know. Link to code
  • Additionally, regarding the zonal-mean API, should we allow users to specify the latitude range for which they want to compute the zonal mean?

Thank you for your help! :) --Amber

Thank you so much for putting this together and helping out Amber!

  1. You can refer to our contributors guide some details, however some of it might be a little outdated.
  2. Yes, having it work for face-centered data only right now is okay
  3. If there is no intersections found for a single lattitude zonal average, we should have it raise an exception, but if the user is performing the zonal average over a range, any lattitude that doesnt have any intersetions should have its value set to np.nan. For example, if we perform the zonal average of lattitudes of [-10, 0, 10] and 0 has no intersections, the result should look something like [0.13, np.nan, 0.33]
  4. See my comment above

philipc2 avatar May 21 '24 14:05 philipc2

For documentation, I have already set up a notebook for a user guide section that you can fill out.

uxarray/docs/user-guide/zonal-mean.ipynb

You should also include any functions in both our User and Internal API under uxarray/docs/internal_api.index.rst and uxarray/docs/user_api.index.rst

philipc2 avatar May 21 '24 14:05 philipc2

You can use one of my work in progress sections as a reference if you'd like.

https://uxarray--711.org.readthedocs.build/en/711/user-guide/topological-aggregations.html

philipc2 avatar May 21 '24 15:05 philipc2

I attempted to run this on one of our sample meshes, but received an error.

base_path = "../../test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"
uxds = ux.open_dataset(grid_path, data_path)
uxds['psi'].zonal_mean()

ValueError: No intersections are found for the face, please make sure the build_latlon_box generates the correct results

There seems to be an error in the bounds computation.

Here is a single face from that mesh: image

These are the bounds generated by uxgrid.bounds image

Compare this to what is generated with GeoPandas:

image

Any Thoughts? It looks to be offset by 360, with everything else being correct.

Example to Reproduce

import uxarray as ux
import numpy as np
import holoviews as hv

base_path = "../uxarray/test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"
uxds = ux.open_dataset(grid_path, data_path)

gdf = uxds.uxgrid.to_geodataframe()
ggdf = gdf.to_geopandas()

first_face_bound_ux = uxds.uxgrid.bounds.values[0]
np.rad2deg(first_face_bound_ux)

first_face_bound_gp = ggdf.bounds[0:1]

hv.Polygons(ggdf[0:1])

philipc2 avatar May 21 '24 20:05 philipc2

I attempted to run this on one of our sample meshes, but received an error.

base_path = "../../test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"
uxds = ux.open_dataset(grid_path, data_path)
uxds['psi'].zonal_mean()

ValueError: No intersections are found for the face, please make sure the build_latlon_box generates the correct results

There seems to be an error in the bounds computation.

Here is a single face from that mesh: image

These are the bounds generated by uxgrid.bounds image

Compare this to what is generated with GeoPandas:

image Any Thoughts? It looks to be offset by 360, with everything else being correct.

Example to Reproduce

import uxarray as ux
import numpy as np
import holoviews as hv

base_path = "../uxarray/test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"
uxds = ux.open_dataset(grid_path, data_path)

gdf = uxds.uxgrid.to_geodataframe()
ggdf = gdf.to_geopandas()

first_face_bound_ux = uxds.uxgrid.bounds.values[0]
np.rad2deg(first_face_bound_ux)

first_face_bound_gp = ggdf.bounds[0:1]

hv.Polygons(ggdf[0:1])

Are these latlon faces? Latlon faces means all edges are either latitude line or longitude lines.

In the bound function, if you didn't pass grid._populate_bounds(is_latlon = True) it will treat every edge as a great circle arc. And great circle arcs will arc up: The max and min latitude is not at the endpoint

There are two test helpers you can utilize to see if that's the problem The following function is a little helper function: it uses an iterative method to find the maximum latitude as the ground true. https://github.com/UXARRAY/uxarray/blob/386e92df3e3a3d436d325c2b7a6ad5f3560c8662/test/test_geometry.py#L166

But to me, the only thing that is off is latitude max:-32.48416(our) and their results:-32.484165. Which seems a lot like a precision issue here.

hongyuchen1030 avatar May 21 '24 21:05 hongyuchen1030

There used to be several test cases in which I ran your mentioned mesh files, and it produced the same bounds as my iterative method(guaranteed to be correct). However, it seems they're removed now for some reason.

And for the longitude range. The inconsistency in the longitude expression might cause issues, as what happened in the previous coordinates conversion vectorization PR.

hongyuchen1030 avatar May 21 '24 22:05 hongyuchen1030

Updated Documentation and Tests:

  • User and internal API documentation
    • uxarray/docs/internal_api.index.rst
    • uxarray/docs/user_api.index.rst
  • User guide
    • uxarray/docs/user-guide/zonal-mean.ipynb
  • Enable user-defined latitude range
  • Unit test coverage for the internal APIs, where the _get_zonal_faces_weight_at_constLat function is simulated in the test code.

I am also encountering a ValueError: No intersections are found for the face, please make sure the build_latlon_box generates the correct results error in the tests. Due to this issue, I am currently bypassing the _get_zonal_faces_weight_at_constLat function, as it seems to be the source of the problem.

Should I go ahead and try to figure out why build_latlon_box is generating incorrect bounds?

Besides that, all the internal APIs seem to be working correctly; I will implement an integrating test after the bound error gets resolved.

amberchen122 avatar Jun 18 '24 12:06 amberchen122

Updated Documentation and Tests:

  • User and internal API documentation

    • uxarray/docs/internal_api.index.rst
    • uxarray/docs/user_api.index.rst
  • User guide

    • uxarray/docs/user-guide/zonal-mean.ipynb
  • Enable user-defined latitude range

  • Unit test coverage for the internal APIs, where the _get_zonal_faces_weight_at_constLat function is simulated in the test code.

I am also encountering a ValueError: No intersections are found for the face, please make sure the build_latlon_box generates the correct results error in the tests. Due to this issue, I am currently bypassing the _get_zonal_faces_weight_at_constLat function, as it seems to be the source of the problem.

Should I go ahead and try to figure out why build_latlon_box is generating incorrect bounds?

Besides that, all the internal APIs seem to be working correctly; I will implement an integrating test after the bound error gets resolved.

Thank you very much for your work! If you get the ValueError: No intersections are found for the face, please make sure the build_latlon_box generates the correct results after you update the edge_nodes_cart helper function, then the problem might be in the new edge_nodes_cart implementation.

hongyuchen1030 avatar Jun 18 '24 18:06 hongyuchen1030

base_path = "../../test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"
uxds = ux.open_dataset(grid_path, data_path)
uxds['psi'].zonal_mean()

I was trying to reproduce this, but I could not load the data files. Do you know why this is happening @philipc2 ? image

amberchen122 avatar Jun 19 '24 19:06 amberchen122

base_path = "../../test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"
uxds = ux.open_dataset(grid_path, data_path)
uxds['psi'].zonal_mean()

I was trying to reproduce this, but I could not load the data files. Do you know why this is happening @philipc2 ? image

It's most likely due to the path being invalid. I'd double check the base path.

philipc2 avatar Jun 30 '24 04:06 philipc2

@philipc2 test/test_api.py::TestAPI::test_open_mf_dataset is failing after I added test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc

I think the reason is because https://github.com/UXARRAY/uxarray/blob/00a4988513ceb816baa3890449726ac4681db653/test/test_api.py#L27C5-L28C68 this hardcoded file path incorrectly matched outCSne30_test2.nc.

Can you please look into it? I opened an issue #860

amberchen122 avatar Jul 22 '24 08:07 amberchen122

@philipc2 test/test_api.py::TestAPI::test_open_mf_dataset is failing after I added test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc

I think the reason is because https://github.com/UXARRAY/uxarray/blob/00a4988513ceb816baa3890449726ac4681db653/test/test_api.py#L27C5-L28C68 this hardcoded file path incorrectly matched outCSne30_test2.nc.

Can you please look into it? I opened an issue #860

Please see our comments in that issue

erogluorhan avatar Jul 23 '24 22:07 erogluorhan

In the process of review this now, I'll also add a simple ASV benchmark that we can run through this PR.

philipc2 avatar Jul 30 '24 20:07 philipc2

Here's an example I was able to generate

image

import matplotlib.pyplot as plt
from matplotlib import gridspec
import uxarray as ux


base_path = "../../test/meshfiles/ugrid/outCSne30/"
grid_path = base_path + "outCSne30.ug"
data_path = base_path + "outCSne30_vortex.nc"

uxds = ux.open_dataset(grid_path, data_path)

zonal_avg_psi = uxds['psi'].zonal_mean()

# create a figure
fig = plt.figure()
 
# to change size of subplot's
# set height of each subplot as 8
fig.set_figheight(5)
 
# set width of each subplot as 8
fig.set_figwidth(15)
 
# create grid for different subplots
spec = gridspec.GridSpec(ncols=2, nrows=1,
                         width_ratios=[5, 1], wspace=0.1)


# Global Plot
ax0 = fig.add_subplot(spec[0])
ax0.set_xlim((-180, 180))
ax0.set_ylim((-90, 90))
pc = uxds["psi"].to_polycollection(periodic_elements='exclude')
pc.set_cmap("plasma")
ax0.add_collection(pc)
cbar = fig.colorbar(pc, ax=ax0, orientation='vertical', location='left', shrink=0.8)
cbar.set_label('Colorbar Label')
ax0.set_title("PSI")

# Zonal Plot
ax1 = fig.add_subplot(spec[1])
ax1.set_ylim((-90, 90))
ax1.set_xlim((-1.5, 1.5))
ax1.grid()
ax1.set_title("PSI Zonal Mean")

ax1.plot(zonal_avg_psi.values, zonal_avg_psi.latitude_deg.values, lw=2)

philipc2 avatar Jul 30 '24 21:07 philipc2