uxarray
uxarray copied to clipboard
Compute Grid Centerpoint using Welzl's algorithm
I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.
Maybe we should consider the following design?
- Have the default
face_xyzorface_latlonvalues be either what was parsed from a dataset OR the existing Cartesian averaging - Introduce a grid-level
Grid.populate_face_coordinates()function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.
This would make the workflow look something like the following:
# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon
# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')
# value will now be populated using your approach
uxgrid.face_lon
# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')
# value will now be populated using artesian average
uxgrid.face_lon
This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?
I'm not a big fan of the
_ctrptapproach to the properties. These values are still, for example,face_lonandface_lat, they simply use a different algorithm to compute them.Maybe we should consider the following design?
- Have the default
face_xyzorface_latlonvalues be either what was parsed from a dataset OR the existing Cartesian averaging- Introduce a grid-level
Grid.populate_face_coordinates()function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.This would make the workflow look something like the following:
# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg uxgrid.face_lon # I want to explicitly set the algorithm to be Welzl uxgrid.populate_face_coordinates(method='welzl') # value will now be populated using your approach uxgrid.face_lon # I want to re-populate again using cartesiain averaging uxgrid.populate_face_coordinates(method='cartesian average') # value will now be populated using artesian average uxgrid.face_lonThis allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?
During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.
We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.
We can get another name for ctrpt, I don't like it also:)
I'm not a big fan of the
_ctrptapproach to the properties. These values are still, for example,face_lonandface_lat, they simply use a different algorithm to compute them. Maybe we should consider the following design?
- Have the default
face_xyzorface_latlonvalues be either what was parsed from a dataset OR the existing Cartesian averaging- Introduce a grid-level
Grid.populate_face_coordinates()function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.This would make the workflow look something like the following:
# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg uxgrid.face_lon # I want to explicitly set the algorithm to be Welzl uxgrid.populate_face_coordinates(method='welzl') # value will now be populated using your approach uxgrid.face_lon # I want to re-populate again using cartesiain averaging uxgrid.populate_face_coordinates(method='cartesian average') # value will now be populated using artesian average uxgrid.face_lonThis allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?
During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.
We might be able to get both with the way you propose also, but with two calls to
populateone with either options, having both available to the grid object at once might be better.We can get another name for
ctrpt, I don't like it also:)
My main concern with breaking up the different types of coordinates in separate attributes is that it'll add extra overhead for us to ensure that the coordinates we read match the ones that we want to store, not to mention needing to redefine / extent the UGRID conventions further past what we've already done. Even with this (and say some other method down the line), this could end up looking like:
Grid.face_lonGrid.face_lon_centerpointGrid.face_lon_some_other_defenition
Consider the case where two UGRID (or any other format) grid files are loaded into UXarray. If we move forward with a split attribute approach, we'd need to ensure that the coordinates we are reading either go into face_lat/lon or face_lat/lon_centerpoints. There's also no easy way to determine what method each dataset used to compute the centroids at the loading step without parsing for any specific attributes in the file (if they exist), since this is not outlined in the UGRID conventions.
I'm still in favor of keeping face_lon and face_lat and general variables for storing some coordinate that represents the center/midpoint/centroid etc. of the face. This does limit us to only storing one type of "center" coordinate at a time, but ensures that we don't restrict us to strictly defining the type of definition for the center.
@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?
@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?
even if it was needed, we can handle it as used with different variables in the test_centroids.py. I have removed the _ctrpt properties in favor of only using face_lon/lat/x/y/z.
Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.
Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.
originally I wanted to compare both and hence keep both, but it makes more sense to just have one attribute and keep the grid object clean. I committed the change to have only one.
Member
Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.
I think the key idea of Welzl's algorithm is not just to find the center point but to compute a bounding circle. This has several advantages:
- Storage Efficiency: Only two variables are needed—the center point and the radius—reducing storage requirements.
- Geometric Integrity: A bounding circle inherently accounts for spherical geometry, including curvature and pole points, as well as the antimeridian.
However, there are some disadvantages compared to using a latitude-longitude box:
- Loose Fit for Slim Faces: A bounding circle can result in a larger radius, which is not as tight when bounding "slim" faces.
Therefore, providing both the bounding circle and the lat-lon box offers users a more comprehensive solution.
Member
Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.
I think the key idea of Welzl's algorithm is not just to find the center point but to compute a bounding circle. This has several advantages:
- Storage Efficiency: Only two variables are needed—the center point and the radius—reducing storage requirements.
- Geometric Integrity: A bounding circle inherently accounts for spherical geometry, including curvature and pole points, as well as the antimeridian.
However, there are some disadvantages compared to using a latitude-longitude box:
- Loose Fit for Slim Faces: A bounding circle can result in a larger radius, which is not as tight when bounding "slim" faces.
Therefore, providing both the bounding circle and the lat-lon box offers users a more comprehensive solution.
Nice summary, Welzl being iterative is computationally a little expensive - but has a clear place as you suggest above.
Just that now: they would both be stored in the same variable in the Grid object. External users can get centerpoint or centroid by specifying something like
uxgrid.compute_face_center(method = "welzl"). # or average
@philipc2 any ideas about asv failure:
https://github.com/UXARRAY/uxarray/actions/runs/10568661430/job/29280049250?pr=811#step:11:5
@philipc2 any ideas about asv failure:
https://github.com/UXARRAY/uxarray/actions/runs/10568661430/job/29280049250?pr=811#step:11:5
Appears to be an issue with this part of the workflow. Looking into it now.
https://github.com/UXARRAY/uxarray/blob/38ae0017e1b08cbaf6b9c32a7b4ee9c2d2dc1401/.github/workflows/asv-benchmarking-pr.yml#L23-L33
Specifically, it can't find a package called 'build'
The following package could not be installed
└─ build does not exist (perhaps a typo or a missing channel).
Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue
Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue
How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.
Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue
How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.
I removed all the numba stuff on my local and the coverage (85% total and 95% for coordinates.py - see pic below) is considerably higher for sha: dba53dc8 which is before I introduced circular dependency.
ASV Benchmarking
Benchmark Comparison Results
Benchmarks that have improved:
| Change | Before [bca3b8cd] | After [066bb6b2] | Ratio | Benchmark (Parameter) |
|---|---|---|---|---|
| failed | 6.67±0.1ms | n/a | mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km') | |
| failed | 3.50±0.03ms | n/a | mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km') | |
| failed | 3.60±0.01s | n/a | mpas_ocean.ConstructFaceLatLon.time_welzl('120km') | |
| failed | 226±2ms | n/a | mpas_ocean.ConstructFaceLatLon.time_welzl('480km') | |
| - | 955±20ns | 832±9ns | 0.87 | mpas_ocean.ConstructTreeStructures.time_kd_tree('120km') |
| - | 340±30ns | 285±1ns | 0.84 | mpas_ocean.ConstructTreeStructures.time_kd_tree('480km') |
| - | 466M | 407M | 0.87 | mpas_ocean.Integrate.peakmem_integrate('480km') |
Benchmarks that have stayed the same:
| Change | Before [bca3b8cd] | After [066bb6b2] | Ratio | Benchmark (Parameter) |
|---|---|---|---|---|
| 442M | 438M | 0.99 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc')) | |
| 442M | 442M | 1 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc')) | |
| 445M | 445M | 1 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc')) | |
| 443M | 443M | 1 | face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc')) | |
| 1.59±0.01s | 1.59±0s | 1 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc')) | |
| 228±0.05ms | 225±1ms | 0.99 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc')) | |
| 2.03±0.01s | 2.07±0.01s | 1.02 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc')) | |
| 7.78±0.07ms | 7.68±0.07ms | 0.99 | face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc')) | |
| 3.02±0.04s | 2.99±0.02s | 0.99 | import.Imports.timeraw_import_uxarray | |
| 675±20μs | 654±8μs | 0.97 | mpas_ocean.CheckNorm.time_check_norm('120km') | |
| 442±5μs | 429±4μs | 0.97 | mpas_ocean.CheckNorm.time_check_norm('480km') | |
| 636±10ms | 645±2ms | 1.01 | mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km') | |
| 41.6±0.4ms | 41.2±0.2ms | 0.99 | mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km') | |
| 1.84±0.1ms | 1.93±0.1ms | 1.05 | mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km') | |
| 499±20μs | 497±10μs | 1 | mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km') | |
| 1.26±0μs | 1.32±0.02μs | 1.05 | mpas_ocean.ConstructTreeStructures.time_ball_tree('120km') | |
| 332±5ns | 306±4ns | 0.92 | mpas_ocean.ConstructTreeStructures.time_ball_tree('480km') | |
| 1.04±0.01s | 1.06±0s | 1.02 | mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False) | |
| 54.8±0.5ms | 55.8±0.6ms | 1.02 | mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True) | |
| 78.7±0.2ms | 80.3±1ms | 1.02 | mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False) | |
| 5.10±0.09ms | 5.07±0.1ms | 0.99 | mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True) | |
| 318M | 318M | 1 | mpas_ocean.Gradient.peakmem_gradient('120km') | |
| 295M | 295M | 1 | mpas_ocean.Gradient.peakmem_gradient('480km') | |
| 2.69±0.02ms | 2.66±0.03ms | 0.99 | mpas_ocean.Gradient.time_gradient('120km') | |
| 295±4μs | 286±0.5μs | 0.97 | mpas_ocean.Gradient.time_gradient('480km') | |
| 239±7μs | 251±7μs | 1.05 | mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km') | |
| 121±0.7μs | 121±0.6μs | 1 | mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km') | |
| 425M | 424M | 1 | mpas_ocean.Integrate.peakmem_integrate('120km') | |
| 175±2ms | 177±5ms | 1.01 | mpas_ocean.Integrate.time_integrate('120km') | |
| 11.8±0.1ms | 12.9±0.1ms | 1.1 | mpas_ocean.Integrate.time_integrate('480km') | |
| 347±5ms | 344±5ms | 0.99 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude') | |
| 339±3ms | 342±4ms | 1.01 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include') | |
| 343±3ms | 342±2ms | 1 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split') | |
| 22.6±0.2ms | 22.8±0.3ms | 1.01 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude') | |
| 22.6±0.4ms | 22.9±0.3ms | 1.01 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include') | |
| 23.1±0.5ms | 22.9±0.4ms | 0.99 | mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split') | |
| 54.5±0.1ms | 54.8±0.3ms | 1.01 | mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping | |
| 44.2±0.04ms | 44.2±0.01ms | 1 | mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping | |
| 360±2ms | 360±0.5ms | 1 | mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping | |
| 263±0.2ms | 265±1ms | 1.01 | mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping | |
| 295M | 291M | 0.99 | quad_hexagon.QuadHexagon.peakmem_open_dataset | |
| 291M | 291M | 1 | quad_hexagon.QuadHexagon.peakmem_open_grid | |
| 6.49±0.04ms | 6.56±0.06ms | 1.01 | quad_hexagon.QuadHexagon.time_open_dataset | |
| 5.55±0.01ms | 5.66±0.06ms | 1.02 | quad_hexagon.QuadHexagon.time_open_grid |