uxarray icon indicating copy to clipboard operation
uxarray copied to clipboard

Compute Grid Centerpoint using Welzl's algorithm

Open rajeeja opened this issue 1 year ago • 9 comments
trafficstars

rajeeja avatar Jun 11 '24 11:06 rajeeja

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

philipc2 avatar Jul 02 '24 17:07 philipc2

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

rajeeja avatar Jul 09 '24 00:07 rajeeja

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them. Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

My main concern with breaking up the different types of coordinates in separate attributes is that it'll add extra overhead for us to ensure that the coordinates we read match the ones that we want to store, not to mention needing to redefine / extent the UGRID conventions further past what we've already done. Even with this (and say some other method down the line), this could end up looking like:

  • Grid.face_lon
  • Grid.face_lon_centerpoint
  • Grid.face_lon_some_other_defenition

Consider the case where two UGRID (or any other format) grid files are loaded into UXarray. If we move forward with a split attribute approach, we'd need to ensure that the coordinates we are reading either go into face_lat/lon or face_lat/lon_centerpoints. There's also no easy way to determine what method each dataset used to compute the centroids at the loading step without parsing for any specific attributes in the file (if they exist), since this is not outlined in the UGRID conventions.

I'm still in favor of keeping face_lon and face_lat and general variables for storing some coordinate that represents the center/midpoint/centroid etc. of the face. This does limit us to only storing one type of "center" coordinate at a time, but ensures that we don't restrict us to strictly defining the type of definition for the center.

@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?

philipc2 avatar Aug 01 '24 19:08 philipc2

@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?

even if it was needed, we can handle it as used with different variables in the test_centroids.py. I have removed the _ctrpt properties in favor of only using face_lon/lat/x/y/z.

rajeeja avatar Aug 06 '24 15:08 rajeeja

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

rljacob avatar Aug 06 '24 19:08 rljacob

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

originally I wanted to compare both and hence keep both, but it makes more sense to just have one attribute and keep the grid object clean. I committed the change to have only one.

rajeeja avatar Aug 06 '24 20:08 rajeeja

Member

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

I think the key idea of Welzl's algorithm is not just to find the center point but to compute a bounding circle. This has several advantages:

  1. Storage Efficiency: Only two variables are needed—the center point and the radius—reducing storage requirements.
  2. Geometric Integrity: A bounding circle inherently accounts for spherical geometry, including curvature and pole points, as well as the antimeridian.

However, there are some disadvantages compared to using a latitude-longitude box:

  • Loose Fit for Slim Faces: A bounding circle can result in a larger radius, which is not as tight when bounding "slim" faces.

Therefore, providing both the bounding circle and the lat-lon box offers users a more comprehensive solution.

hongyuchen1030 avatar Aug 06 '24 20:08 hongyuchen1030

Member

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

I think the key idea of Welzl's algorithm is not just to find the center point but to compute a bounding circle. This has several advantages:

  1. Storage Efficiency: Only two variables are needed—the center point and the radius—reducing storage requirements.
  2. Geometric Integrity: A bounding circle inherently accounts for spherical geometry, including curvature and pole points, as well as the antimeridian.

However, there are some disadvantages compared to using a latitude-longitude box:

  • Loose Fit for Slim Faces: A bounding circle can result in a larger radius, which is not as tight when bounding "slim" faces.

Therefore, providing both the bounding circle and the lat-lon box offers users a more comprehensive solution.

Nice summary, Welzl being iterative is computationally a little expensive - but has a clear place as you suggest above. Just that now: they would both be stored in the same variable in the Grid object. External users can get centerpoint or centroid by specifying something like

uxgrid.compute_face_center(method = "welzl"). # or average

rajeeja avatar Aug 06 '24 20:08 rajeeja

@philipc2 any ideas about asv failure:

https://github.com/UXARRAY/uxarray/actions/runs/10568661430/job/29280049250?pr=811#step:11:5

rajeeja avatar Aug 26 '24 23:08 rajeeja

@philipc2 any ideas about asv failure:

https://github.com/UXARRAY/uxarray/actions/runs/10568661430/job/29280049250?pr=811#step:11:5

Appears to be an issue with this part of the workflow. Looking into it now.

https://github.com/UXARRAY/uxarray/blob/38ae0017e1b08cbaf6b9c32a7b4ee9c2d2dc1401/.github/workflows/asv-benchmarking-pr.yml#L23-L33

Specifically, it can't find a package called 'build'

  The following package could not be installed
  └─ build does not exist (perhaps a typo or a missing channel).

philipc2 avatar Aug 26 '24 23:08 philipc2

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

rajeeja avatar Sep 16 '24 23:09 rajeeja

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

erogluorhan avatar Sep 17 '24 00:09 erogluorhan

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

I removed all the numba stuff on my local and the coverage (85% total and 95% for coordinates.py - see pic below) is considerably higher for sha: dba53dc8 which is before I introduced circular dependency.

Screenshot 2024-09-17 at 17 37 01

rajeeja avatar Sep 17 '24 22:09 rajeeja

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [bca3b8cd] After [066bb6b2] Ratio Benchmark (Parameter)
failed 6.67±0.1ms n/a mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
failed 3.50±0.03ms n/a mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
failed 3.60±0.01s n/a mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
failed 226±2ms n/a mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
- 955±20ns 832±9ns 0.87 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
- 340±30ns 285±1ns 0.84 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
- 466M 407M 0.87 mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Change Before [bca3b8cd] After [066bb6b2] Ratio Benchmark (Parameter)
442M 438M 0.99 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
442M 442M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
445M 445M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
443M 443M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
1.59±0.01s 1.59±0s 1 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
228±0.05ms 225±1ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
2.03±0.01s 2.07±0.01s 1.02 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
7.78±0.07ms 7.68±0.07ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
3.02±0.04s 2.99±0.02s 0.99 import.Imports.timeraw_import_uxarray
675±20μs 654±8μs 0.97 mpas_ocean.CheckNorm.time_check_norm('120km')
442±5μs 429±4μs 0.97 mpas_ocean.CheckNorm.time_check_norm('480km')
636±10ms 645±2ms 1.01 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
41.6±0.4ms 41.2±0.2ms 0.99 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
1.84±0.1ms 1.93±0.1ms 1.05 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
499±20μs 497±10μs 1 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
1.26±0μs 1.32±0.02μs 1.05 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
332±5ns 306±4ns 0.92 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
1.04±0.01s 1.06±0s 1.02 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
54.8±0.5ms 55.8±0.6ms 1.02 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
78.7±0.2ms 80.3±1ms 1.02 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.10±0.09ms 5.07±0.1ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
318M 318M 1 mpas_ocean.Gradient.peakmem_gradient('120km')
295M 295M 1 mpas_ocean.Gradient.peakmem_gradient('480km')
2.69±0.02ms 2.66±0.03ms 0.99 mpas_ocean.Gradient.time_gradient('120km')
295±4μs 286±0.5μs 0.97 mpas_ocean.Gradient.time_gradient('480km')
239±7μs 251±7μs 1.05 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
121±0.7μs 121±0.6μs 1 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
425M 424M 1 mpas_ocean.Integrate.peakmem_integrate('120km')
175±2ms 177±5ms 1.01 mpas_ocean.Integrate.time_integrate('120km')
11.8±0.1ms 12.9±0.1ms 1.1 mpas_ocean.Integrate.time_integrate('480km')
347±5ms 344±5ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
339±3ms 342±4ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
343±3ms 342±2ms 1 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
22.6±0.2ms 22.8±0.3ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
22.6±0.4ms 22.9±0.3ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
23.1±0.5ms 22.9±0.4ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
54.5±0.1ms 54.8±0.3ms 1.01 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
44.2±0.04ms 44.2±0.01ms 1 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
360±2ms 360±0.5ms 1 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
263±0.2ms 265±1ms 1.01 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
295M 291M 0.99 quad_hexagon.QuadHexagon.peakmem_open_dataset
291M 291M 1 quad_hexagon.QuadHexagon.peakmem_open_grid
6.49±0.04ms 6.56±0.06ms 1.01 quad_hexagon.QuadHexagon.time_open_dataset
5.55±0.01ms 5.66±0.06ms 1.02 quad_hexagon.QuadHexagon.time_open_grid

github-actions[bot] avatar Oct 07 '24 20:10 github-actions[bot]