uxarray icon indicating copy to clipboard operation
uxarray copied to clipboard

Hardcoded File Path Causing Test Failure

Open amberchen122 opened this issue 1 year ago • 6 comments
trafficstars

Description The hard coded file path in test/test_api.py::TestAPI::test_open_mf_dataset is unsafe. I added a test file to test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc from Paul, and it failed.

Details

The test test/test_api.py::TestAPI::test_open_mf_dataset is failing after I added test/meshfiles/ugrid/outCSne30/outCSne30_test2.nc.

I think the reason is because of this hardcoded file path incorrectly matching outCSne30_test2.nc.

Error Message

FAILED test/test_api.py::TestAPI::test_open_mf_dataset - ValueError: Could not find any dimension coordinates to use to order the datasets for concatenation

amberchen122 avatar Jul 22 '24 08:07 amberchen122

Yeah this should really be a list of paths since that file path will assume any paths that match the outCSne30_ prefix with a .nc ending. We should have it point to the two data files as seen below.

image

dsfiles_mf_ne30  = [current_path / "meshfiles" / "ugrid" / "outCSne30" / "outCSne30_var2.nc",
                    current_path / "meshfiles" / "ugrid" / "outCSne30" / "outCSne30_vortex.nc"]

philipc2 avatar Jul 23 '24 18:07 philipc2

Even though explicitly listing the file names to be opened with open_mfdatasets can be an alternative solution, I don't think giving file name patterns as in here is the actual reason of the problem.

That is common practice to have multiple data files from the same workflow in the same folder, e.g. separate files for different time stamps in a time-series analysis or separate files for different variables in a big simulation. That said, if we think all of the files under the test/meshfiles/ugrid/outCSne30/ directory are somehow related to each other in such a way, let us keep storing them there; otherwise, we need to consider oving some of them into a different folder. If the former, then I'd expect a code something like this should still work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

erogluorhan avatar Jul 23 '24 22:07 erogluorhan

Thank you for your input! I will move the test files needed for zonal-mean testing (test/meshfiles/ugrid/outCSne30_zonal_test/outCSne30_test2.nc and test/meshfiles/ugrid/outCSne30_zonal_test/outCSne30_test2.nc) to a new folder: test/meshfiles/ugrid/outCSne30_zonal_test.

Does this sound like a good solution for the zonal-mean PR failing the API test?

amberchen122 avatar Jul 26 '24 05:07 amberchen122

Before that, please determine if those newer zonal test files are very close in format/content with the original outCSne30_var2.nc and outCSne30_vortex.nc files. If that's the case, I'd instead suggest that you do the following code change in the failing tests, and it should work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

If that's not the case though, then your suggestion looks good.

erogluorhan avatar Jul 26 '24 18:07 erogluorhan

Before that, please determine if those newer zonal test files are very close in format/content with the original outCSne30_var2.nc and outCSne30_vortex.nc files. If that's the case, I'd instead suggest that you do the following code change in the failing tests, and it should work:

dsfiles_mf_ne30 = str( current_path) + "/meshfiles/ugrid/outCSne30/outCSne30_*.nc"

uxds_mf_ne30 = ux.open_mfdataset(self.gridfile_ne30, self.dsfiles_mf_ne30, , concat_dim="Time", combine="nested") # Use the actual dimension variable to be concatenated over those files

If that's not the case though, then your suggestion looks good.

Thank you for your suggestion. I believe the zonal test files are in close format with the original test files. I have updated the test_open_mf_dataset as you recommended.

The new uxds_mf_ne30.uxgrid._ds.data_vars has one more dimension edge_node_connectivity:

test/test_api.py uxds_mf_ne30.uxgrid._ds.data_vars: grid_topology int32 4B -2147483647 face_node_connectivity (n_face, n_max_face_nodes) int64 173kB 0 8 ... 6 298 node_lon (n_node) float64 43kB -45.0 45.0 ... 138.0 135.0 node_lat (n_node) float64 43kB ... edge_node_connectivity (n_edge, two) int64 173kB 0 8 0 ... 5400 5400 5401

Therefore, I also modified the test constants in test/constants.py. Could you please verify if the new test parameters look good? The modifications are in the zonal-mean branch.

If this solution to the test_api.py bug is acceptable, then PR#785 is also ready for review. :)

amberchen122 avatar Jul 30 '24 07:07 amberchen122

Yes, this looks like a good solution to me; thanks a lot! Hey @philipc2 , do you agree with me that this will be a better solution forward?

erogluorhan avatar Jul 30 '24 19:07 erogluorhan

@rajeeja

Can you check if this is fixed?

philipc2 avatar Mar 18 '25 20:03 philipc2

fixed, no hardcoded paths.

rajeeja avatar Mar 24 '25 16:03 rajeeja