uxarray Compute Grid Centerpoint using Welzl's algorithm

trafficstars

Jun 11 '24 11:06 rajeeja

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

Jul 02 '24 17:07 philipc2

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging

Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:
# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon
This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

Jul 09 '24 00:07 rajeeja

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them. Maybe we should consider the following design?

Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging

Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:
# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon
This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?
During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

My main concern with breaking up the different types of coordinates in separate attributes is that it'll add extra overhead for us to ensure that the coordinates we read match the ones that we want to store, not to mention needing to redefine / extent the UGRID conventions further past what we've already done. Even with this (and say some other method down the line), this could end up looking like:

Grid.face_lon
Grid.face_lon_centerpoint
Grid.face_lon_some_other_defenition

Consider the case where two UGRID (or any other format) grid files are loaded into UXarray. If we move forward with a split attribute approach, we'd need to ensure that the coordinates we are reading either go into face_lat/lon or face_lat/lon_centerpoints. There's also no easy way to determine what method each dataset used to compute the centroids at the loading step without parsing for any specific attributes in the file (if they exist), since this is not outlined in the UGRID conventions.

I'm still in favor of keeping face_lon and face_lat and general variables for storing some coordinate that represents the center/midpoint/centroid etc. of the face. This does limit us to only storing one type of "center" coordinate at a time, but ensures that we don't restrict us to strictly defining the type of definition for the center.

@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?

Aug 01 '24 19:08 philipc2

@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?

even if it was needed, we can handle it as used with different variables in the test_centroids.py. I have removed the _ctrpt properties in favor of only using face_lon/lat/x/y/z.

Aug 06 '24 15:08 rajeeja

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

Aug 06 '24 19:08 rljacob

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

originally I wanted to compare both and hence keep both, but it makes more sense to just have one attribute and keep the grid object clean. I committed the change to have only one.

Aug 06 '24 20:08 rajeeja

Member

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

I think the key idea of Welzl's algorithm is not just to find the center point but to compute a bounding circle. This has several advantages:

Storage Efficiency: Only two variables are needed—the center point and the radius—reducing storage requirements.
Geometric Integrity: A bounding circle inherently accounts for spherical geometry, including curvature and pole points, as well as the antimeridian.

However, there are some disadvantages compared to using a latitude-longitude box:

Loose Fit for Slim Faces: A bounding circle can result in a larger radius, which is not as tight when bounding "slim" faces.

Therefore, providing both the bounding circle and the lat-lon box offers users a more comprehensive solution.

Aug 06 '24 20:08 hongyuchen1030

Member

Why do we need 2 ways to calculate the center of a face? I see no reason to use them both at once.

I think the key idea of Welzl's algorithm is not just to find the center point but to compute a bounding circle. This has several advantages:

Storage Efficiency: Only two variables are needed—the center point and the radius—reducing storage requirements.

Geometric Integrity: A bounding circle inherently accounts for spherical geometry, including curvature and pole points, as well as the antimeridian.

However, there are some disadvantages compared to using a latitude-longitude box:

Loose Fit for Slim Faces: A bounding circle can result in a larger radius, which is not as tight when bounding "slim" faces.

Therefore, providing both the bounding circle and the lat-lon box offers users a more comprehensive solution.

Nice summary, Welzl being iterative is computationally a little expensive - but has a clear place as you suggest above. Just that now: they would both be stored in the same variable in the Grid object. External users can get centerpoint or centroid by specifying something like

uxgrid.compute_face_center(method = "welzl"). # or average

Aug 06 '24 20:08 rajeeja

@philipc2 any ideas about asv failure:

https://github.com/UXARRAY/uxarray/actions/runs/10568661430/job/29280049250?pr=811#step:11:5

Aug 26 '24 23:08 rajeeja

@philipc2 any ideas about asv failure:

https://github.com/UXARRAY/uxarray/actions/runs/10568661430/job/29280049250?pr=811#step:11:5

Appears to be an issue with this part of the workflow. Looking into it now.

https://github.com/UXARRAY/uxarray/blob/38ae0017e1b08cbaf6b9c32a7b4ee9c2d2dc1401/.github/workflows/asv-benchmarking-pr.yml#L23-L33

Specifically, it can't find a package called 'build'

  The following package could not be installed
  └─ build does not exist (perhaps a typo or a missing channel).

Aug 26 '24 23:08 philipc2

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

Sep 16 '24 23:09 rajeeja

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

Sep 17 '24 00:09 erogluorhan

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

I removed all the numba stuff on my local and the coverage (85% total and 95% for coordinates.py - see pic below) is considerably higher for sha: dba53dc8 which is before I introduced circular dependency.

Sep 17 '24 22:09 rajeeja

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change	Before [bca3b8cd]	After [066bb6b2]	Ratio	Benchmark (Parameter)
	failed	6.67±0.1ms	n/a	mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
	failed	3.50±0.03ms	n/a	mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
	failed	3.60±0.01s	n/a	mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
	failed	226±2ms	n/a	mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
-	955±20ns	832±9ns	0.87	mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
-	340±30ns	285±1ns	0.84	mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
-	466M	407M	0.87	mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Before [bca3b8cd]	After [066bb6b2]	Ratio	Benchmark (Parameter)
442M	438M	0.99	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
442M	442M	1	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
445M	445M	1	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
443M	443M	1	face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
1.59±0.01s	1.59±0s	1	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
228±0.05ms	225±1ms	0.99	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
2.03±0.01s	2.07±0.01s	1.02	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
7.78±0.07ms	7.68±0.07ms	0.99	face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
3.02±0.04s	2.99±0.02s	0.99	import.Imports.timeraw_import_uxarray
675±20μs	654±8μs	0.97	mpas_ocean.CheckNorm.time_check_norm('120km')
442±5μs	429±4μs	0.97	mpas_ocean.CheckNorm.time_check_norm('480km')
636±10ms	645±2ms	1.01	mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
41.6±0.4ms	41.2±0.2ms	0.99	mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
1.84±0.1ms	1.93±0.1ms	1.05	mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
499±20μs	497±10μs	1	mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
1.26±0μs	1.32±0.02μs	1.05	mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
332±5ns	306±4ns	0.92	mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
1.04±0.01s	1.06±0s	1.02	mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
54.8±0.5ms	55.8±0.6ms	1.02	mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
78.7±0.2ms	80.3±1ms	1.02	mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.10±0.09ms	5.07±0.1ms	0.99	mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
318M	318M	1	mpas_ocean.Gradient.peakmem_gradient('120km')
295M	295M	1	mpas_ocean.Gradient.peakmem_gradient('480km')
2.69±0.02ms	2.66±0.03ms	0.99	mpas_ocean.Gradient.time_gradient('120km')
295±4μs	286±0.5μs	0.97	mpas_ocean.Gradient.time_gradient('480km')
239±7μs	251±7μs	1.05	mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
121±0.7μs	121±0.6μs	1	mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
425M	424M	1	mpas_ocean.Integrate.peakmem_integrate('120km')
175±2ms	177±5ms	1.01	mpas_ocean.Integrate.time_integrate('120km')
11.8±0.1ms	12.9±0.1ms	1.1	mpas_ocean.Integrate.time_integrate('480km')
347±5ms	344±5ms	0.99	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
339±3ms	342±4ms	1.01	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
343±3ms	342±2ms	1	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
22.6±0.2ms	22.8±0.3ms	1.01	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
22.6±0.4ms	22.9±0.3ms	1.01	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
23.1±0.5ms	22.9±0.4ms	0.99	mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
54.5±0.1ms	54.8±0.3ms	1.01	mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
44.2±0.04ms	44.2±0.01ms	1	mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
360±2ms	360±0.5ms	1	mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
263±0.2ms	265±1ms	1.01	mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
295M	291M	0.99	quad_hexagon.QuadHexagon.peakmem_open_dataset
291M	291M	1	quad_hexagon.QuadHexagon.peakmem_open_grid
6.49±0.04ms	6.56±0.06ms	1.01	quad_hexagon.QuadHexagon.time_open_dataset
5.55±0.01ms	5.66±0.06ms	1.02	quad_hexagon.QuadHexagon.time_open_grid

Oct 07 '24 20:10 github-actions[bot]

uxarray uxarray copied to clipboard

Compute Grid Centerpoint using Welzl's algorithm

ASV Benchmarking

uxarray
uxarray copied to clipboard