cf-python icon indicating copy to clipboard operation
cf-python copied to clipboard

Optimsed lazy indexing - h5netcdf backend - Active storage reductions

Open davidhassell opened this issue 1 year ago • 10 comments

Fixes #501

This PR has three parts, which ideally would have been in three separate PRs, but they needed to be co-developed, and splitting them out retrospectively turned out to be to hard. The three parts are:

  • Optimised lazy indexing

    • For data in files, a layer in the Dask graph which indexes the array object (such as a NetCDF4Array object) no longer triggers a read from disk. This has to be done within another layer by converting the indexed array object to a numpy array (typically with the new cf_asnyarray function). When an already-indexed array object is indexed again, the two indices are effectively combined to create a new index. E.g. If the array has shape (10, 10), the first index is [:, :], the second index is [::2, [1, 5, 3, 7]], and the third index is [::-2, 1::2], then this is equivalent to the single index [8::-4, 5:8:2]. Only the data for the combined index needs to retrieved from disk, which is a huge performance improvement, as the amount of IO is typically vastly reduced. The same happens for array objects in memory (such as numpy arrays), but for these the performance improvements are negligible.
  • h5netcdf

    • Introducing the ability to use h5netcdf as a backend. Essentially making cf-python compatible with https://github.com/NCAS-CMS/cfdm/pull/307.
  • Active storage reductions

    • Implement the option to do active storage reductions - i.e. to move selected collapse calculations from the client's CPU to a remote server that is "close" (in a network sense) to the data being collapsed.

To help alleviate the difficulties of reviewing a 3-in-1 PR, I have marked up every code change with which of the three parts it relates to, and provided a checklist for all of the changes.

Optimised lazy indexing

  • [ ] cf/docstring/docstring.py:637: # REVIEW: getitem: _docstring_substitution_definitions: 'asanyarray'

  • [ ] cf/test/test_NetCDF4Array.py:138: # REVIEW: getitem: test_NetCDF4Array: test NetCDF4Array.shape
  • [ ] cf/test/test_NetCDF4Array.py:148: # REVIEW: getitem: test_NetCDF4Array: test NetCDF4Array.index

  • [ ] cf/test/test_Data.py:1482: # REVIEW: getitem: test_Data__getitem__: Chained subspaces reading from disk
  • [ ] cf/test/test_Data.py:3293: # REVIEW: getitem: test_Data_rechunk: rechunking after a getitem
  • [ ] cf/test/test_Data.py:4523: # # REVIEW: getitem: test_Data_active_storage: test Data.active_storage
  • [ ] cf/test/test_Data.py:4571: # REVIEW: getitem: test_Data_cull_graph: prevent new asanyarray layer
  • [ ] cf/test/test_Data.py:4830: # REVIEW: getitem: test_Data_is_masked: test Data.is_masked

  • [ ] cf/test/test_Field.py:1469: # REVIEW: getitem: test_Field_indices: make sure works when 'g.array' is not masked
  • [ ] cf/test/test_Field.py:1488: # REVIEW: getitem: test_Field_indices: make sure works when 'g.array' is not masked

  • [ ] cf/test/test_FullArray.py:1:# REVIEW: getitem: test_FullArray: new test module

  • [ ] cf/mixin/propertiesdata.py:4695: # REVIEW: getitem: to_dask_array: new keyword 'asanyarray'

  • [ ] cf/read_write/netcdf/netcdfwrite.py:752: # REVIEW: getitem: _cfa_write_non_standard_terms: set 'asanyarray'
  • [ ] cf/read_write/netcdf/netcdfwrite.py:813: # REVIEW: getitem: _cfa_unique: convert a to a usable array
  • [ ] cf/read_write/netcdf/netcdfwrite.py:969: # REVIEW: getitem: _cfa_aggregation_instructions: set 'asanyarray'

  • [ ] cf/data/dask_regrid.py:5:# REVIEW: getitem: regrid.py: import cf_asanyarray
  • [ ] cf/data/dask_regrid.py:180: # REVIEW: getitem: regrid: convert a to a usable array

  • [ ] cf/data/data.py:47:# REVIEW: getitem: data.py: import cf_asanyarray, cf_filled, cf_is_masked
  • [ ] cf/data/data.py:378: # REVIEW: getitem: __init__: set 'asanyarray'
  • [ ] cf/data/data.py:482: # REVIEW: getitem: __init__: Set whether or not to call np.asanyarray on chunks to convert them to numpy arrays.
  • [ ] cf/data/data.py:516: # REVIEW: getitem: __init__: set 'asanyarray'
  • [ ] cf/data/data.py:639: # REVIEW: getitem: cf_contains: set 'asanyarray'
  • [ ] cf/data/data.py:792: # REVIEW: getitem: __len__: set 'asanyarray'
  • [ ] cf/data/data.py:901: # REVIEW: getitem: __getitem__: set 'asanyarray'
  • [ ] cf/data/data.py:961: # REVIEW: getitem: __getitem__: set 'asanyarray=True' because subspaced chunks might not be in memory
  • [ ] cf/data/data.py:1184: # REVIEW: getitem: __asanyarray__: new property __asanyarray__
  • [ ] cf/data/data.py:1417: # REVIEW: getitem: _set_dask: new keyword 'asanyarray'
  • [ ] cf/data/data.py:1476: # REVIEW: getitem: _set_dask: set 'asanyarray'
  • [ ] cf/data/data.py:2552: # REVIEW: getitem: percentile: set 'asanyarray'
  • [ ] cf/data/data.py:3052: # REVIEW: getitem: percentile: rectify comment
  • [ ] cf/data/data.py:3240: # REVIEW: getitem: rechunk: set 'asanyarray'
  • [ ] cf/data/data.py:3299: # REVIEW: getitem: _asdatetime: set 'asanyarray'
  • [ ] cf/data/data.py:3357: # REVIEW: getitem: _asreftime: set 'asanyarray'
  • [ ] cf/data/data.py:3970: # REVIEW: getitem: _regrid: set 'asanyarray'
  • [ ] cf/data/data.py:4215: # REVIEW: getitem: concatenate: set 'asanyarray'
  • [ ] cf/data/data.py:4248: # REVIEW: getitem: concatenate: define the asanyarray status
  • [ ] cf/data/data.py:4259: # REVIEW: getitem: concatenate: set 'asanyarray'
  • [ ] cf/data/data.py:4905: # REVIEW: getitem: chunks: set 'asanyarray'
  • [ ] cf/data/data.py:4963: # REVIEW: getitem: Units: set 'asanyarray'
  • [ ] cf/data/data.py:5033: # REVIEW: getitem: dtype: set 'asanyarray'
  • [ ] cf/data/data.py:5149: # REVIEW: getitem: is_masked: set 'asanyarray'
  • [ ] cf/data/data.py:5195: # REVIEW: getitem: nbytes: set 'asanyarray'
  • [ ] cf/data/data.py:5232: # REVIEW: getitem: ndim: set 'asanyarray'
  • [ ] cf/data/data.py:5257: # REVIEW: getitem: npartitions: set 'asanyarray'
  • [ ] cf/data/data.py:5281: # REVIEW: getitem: numblocks set 'asanyarray'
  • [ ] cf/data/data.py:5314: # REVIEW: getitem: shape: set 'asanyarray'
  • [ ] cf/data/data.py:5356: # REVIEW: getitem: size set 'asanyarray'
  • [ ] cf/data/data.py:6555: # REVIEW: getitem: convert_reference_time: set 'asanyarray'
  • [ ] cf/data/data.py:6636: # REVIEW: getitem: get_deterministic_name: set 'asanyarray'
  • [ ] cf/data/data.py:6706: # REVIEW: getitem: get_filenames: set 'asanyarray'
  • [ ] cf/data/data.py:6843: # REVIEW: getitem: add_file_location: set 'asanyarray'
  • [ ] cf/data/data.py:8287: # REVIEW: getitem: unique: set 'asanyarray'
  • [ ] cf/data/data.py:9029: # REVIEW: getitem: hardmask: set 'asanyarray'
  • [ ] cf/data/data.py:9152: # REVIEW: getitem: soften_mask: set 'asanyarray'
  • [ ] cf/data/data.py:9184: # REVIEW: getitem: file_locations: set 'asanyarray'
  • [ ] cf/data/data.py:9245: # REVIEW: getitem: filled: set 'asanyarray'
  • [ ] cf/data/data.py:9877: # REVIEW: getitem: to_dask_array: new keyword 'asanyarray'
  • [ ] cf/data/data.py:10209: # REVIEW: getitem: del_file_location: set 'asanyarray'
  • [ ] cf/data/data.py:11394: # REVIEW: getitem: where: set 'asanyarray'
  • [ ] cf/data/data.py:11412: # REVIEW: getitem: where: set 'asanyarray'
  • [ ] cf/data/data.py:11458: # REVIEW: getitem: where: 'da.asanyarray' is no longer required
  • [ ] cf/data/data.py:11685: # REVIEW: getitem: cull_graph: set 'asanyarray'
  • [ ] cf/data/data.py:11957: # REVIEW: getitem: todict: new keywords 'apply_mask_hardness', 'asanyarray'

  • [ ] cf/data/array/h5netcdfarray.py:53: # REVIEW: getitem: _get_array: new method to convert subspace to numpy array.

  • [ ] cf/data/array/mixin/init.py:8:# REVIEW: getitem: __init__.py: import IndexMixin

  • [ ] cf/data/array/mixin/indexmixin.py:10:# REVIEW: getitem: IndexMixin: new mixin class IndexMixin

  • [ ] cf/data/array/netcdf4array.py:50: # REVIEW: getitem: _get_array: Ignore this for h5 review
  • [ ] cf/data/array/netcdf4array.py:51: # REVIEW: getitem: _get_array: new method to convert subspace to numpy array

  • [ ] cf/data/array/umarray.py:174: # REVIEW: getitem: _get_array: new method to convert subspace to numpy array
  • [ ] cf/data/array/umarray.py:275: # REVIEW: getitem: _set_FillValue: record _FillValue in attributes
  • [ ] cf/data/array/umarray.py:370: # REVIEW: getitem: _set_units: record units in attributes
  • [ ] cf/data/array/umarray.py:373: # REVIEW: getitem: _set_unpack: record unpack in attributes

  • [ ] cf/data/array/fullarray.py:124: # REVIEW: getitem: _get_array: new method to convert subspace to numpy array
  • [ ] cf/data/array/fullarray.py:159: # REVIEW: getitem: array: New property to convert subspace to numpy array

  • [ ] cf/data/dask_utils.py:130: # REVIEW: getitem: cf_contains: convert a to a usable array
  • [ ] cf/data/dask_utils.py:166: # REVIEW: getitem: cf_convolve1d: convert a to a usable array
  • [ ] cf/data/dask_utils.py:210: # REVIEW: getitem: cf_harden_mask: convert a to a usable array
  • [ ] cf/data/dask_utils.py:282: # REVIEW: getitem: cf_percentile: convert a to a usable array
  • [ ] cf/data/dask_utils.py:378: # REVIEW: getitem: cf_soften_mask: convert a to a usable array
  • [ ] cf/data/dask_utils.py:436: # REVIEW: getitem: cf_where: convert array, condition, x, y to usable arrays
  • [ ] cf/data/dask_utils.py:573: # REVIEW: getitem: cf_rt2dt: convert a to a usable array
  • [ ] cf/data/dask_utils.py:629: # REVIEW: getitem: cf_dt2rt: convert a to a usable array
  • [ ] cf/data/dask_utils.py:671: # REVIEW: getitem: cf_units: convert a to a usable array
  • [ ] cf/data/dask_utils.py:696: # REVIEW: getitem: cf_is_masked: convert a to a usable array
  • [ ] cf/data/dask_utils.py:730: # REVIEW: getitem: cf_filled: convert a to a usable array
  • [ ] cf/data/dask_utils.py:735:# REVIEW: getitem: cf_asanyarray: convert a to a usable array
  • [ ] cf/data/dask_utils.py:755: # REVIEW: getitem: cf_asanyarray: convert a to a usable array

  • [ ] cf/data/fragment/mixin/fragmentarraymixin.py:15: # REVIEW: getitem: _get_array: new method to convert subspace to numpy array
  • [ ] cf/data/fragment/mixin/fragmentarraymixin.py:169: # REVIEW: getitem: _size_1_axis: refactor to use original_shape

  • [ ] cf/data/fragment/netcdffragmentarray.py:10:# REVIEW: getitem: NetCDFFragmentArray: new inheritance to allow for different netCDF backends
  • [ ] cf/data/fragment/netcdffragmentarray.py:179: # REVIEW: getitem: _get_array: new method to convert subspace to numpy array

  • [ ] cf/data/creation.py:63: # REVIEW: getitem: to_dask: set 'asanyarray'
  • [ ] cf/data/creation.py:87: # REVIEW: getitem: to_dask: The file lock is now on the Array object (in its _get_array method), rather than being set on the Dask array itself.

  • [ ] cf/data/utils.py:877: # REVIEW: getitem: collapse: set 'asanyarray'

  • [ ] cf/functions.py:2449:# REVIEW: getitem: get_subspace: remove deprecated function

h5netcdf

  • [ ] cf/test/test_active_storage.py:1:# REVIEW: h5: test_active_storage.py: new test module

  • [ ] cf/test/test_NetCDF4Array.py:15:# REVIEW: h5: test_NetCDF4Array.py: renamed 'NetCDFArray' to 'NetCDF4Array'
  • [ ] cf/test/test_NetCDF4Array.py:44: # REVIEW: h5: test_NetCDF4Array: renamed 'NetCDFArray' to 'NetCDF4Array'
  • [ ] cf/test/test_NetCDF4Array.py:65: # REVIEW: h5: test_NetCDF4Array: renamed 'NetCDFArray' to 'NetCDF4Array'
  • [ ] cf/test/test_NetCDF4Array.py:76: # REVIEW: h5: test_NetCDF4Array: renamed 'NetCDFArray' to 'NetCDF4Array'
  • [ ] cf/test/test_NetCDF4Array.py:112: # REVIEW: h5: test_NetCDF4Array: renamed 'NetCDFArray' to 'NetCDF4Array'
  • [ ] cf/test/test_NetCDF4Array.py:120: # REVIEW: h5: test_NetCDF4Array: renamed 'NetCDFArray' to 'NetCDF4Array'

  • [ ] cf/test/test_read_write.py:82: # REVIEW: h5: test_read_mask: rename numpy to np
  • [ ] cf/test/test_read_write.py:564: # REVIEW: h5: test_write_datatype: rename numpy to np

  • [ ] cf/read_write/um/umread.py:1976: # REVIEW: h5: create_data: replace units/calendar API with attributes

  • [ ] cf/read_write/write.py:16:# REVIEW: h5: write: docstring improvements

  • [ ] cf/read_write/netcdf/netcdfread.py:212: # REVIEW: h5: _create_data: control caching
  • [ ] cf/read_write/netcdf/netcdfread.py:257: # REVIEW: h5: _create_data: replace units/calendar API with attributes
  • [ ] cf/read_write/netcdf/netcdfread.py:266: # REVIEW: h5: _create_data: don't cache data from CFA variables
  • [ ] cf/read_write/netcdf/netcdfread.py:627: # REVIEW: h5: _create_cfanetcdfarray: docstring/comment improvements
  • [ ] cf/read_write/netcdf/netcdfread.py:702: # REVIEW: h5: _create_cfanetcdfarray: choose the correct netCDF backend
  • [ ] cf/read_write/netcdf/netcdfread.py:756: # REVIEW: h5: _create_cfanetcdfarray_term: fix unknown fragment shape
  • [ ] cf/read_write/netcdf/netcdfread.py:775: # REVIEW: h5: _create_cfanetcdfarray_term: choose the correct netCDF backend
  • [ ] cf/read_write/netcdf/netcdfread.py:965: # REVIEW: h5: _cfa_parse_aggregated_data: use cfdm.netcdf_indexer to get data
  • [ ] cf/read_write/netcdf/netcdfwrite.py:582: # REVIEW: h5: Deleted function _convert_to_builtin_type was a CFA-0.4 thing

  • [ ] cf/read_write/read.py:61: # REVIEW: h5: read: new 'unpack' parameter to control auto-unpacking (previously always True)
  • [ ] cf/read_write/read.py:67: # REVIEW: h5: read: new 'netcdf_backend' parameter to control how to read files
  • [ ] cf/read_write/read.py:69: # REVIEW: h5: read: new 'storage_options' parameter to control access to S3
  • [ ] cf/read_write/read.py:71: # REVIEW: h5: read: 'cache' parameter to control whether or not to get to cache selected data elements

  • [ ] cf/data/array/init.py:4:# REVIEW: h5: __init__.py: import CFAH5netcdfArray
  • [ ] cf/data/array/init.py:7:# REVIEW: h5: __init__.py: import CFAH5netcdfArray
  • [ ] cf/data/array/init.py:12:# REVIEW: h5: __init__.py: import H5netcdfArray
  • [ ] cf/data/array/init.py:16:# REVIEW: h5: __init__.py: import NetCDF4Array

  • [ ] cf/data/array/h5netcdfarray.py:1:# REVIEW: h5: H5netcdfArray: New class to access netCDF with h5netcdf
  • [ ] cf/data/array/h5netcdfarray.py:52: # REVIEW: h5: _get_array: Ignore this for h5 review

  • [ ] cf/data/array/mixin/cfamixin.py:39: # REVIEW: h5: __init__: replace units/calendar API with attributes
  • [ ] cf/data/array/mixin/cfamixin.py:228: # REVIEW: h5: _parse_cfa: Refactoring of code that used to be in __init__
  • [ ] cf/data/array/mixin/cfamixin.py:469: # REVIEW: h5: get_storage_options: new method to get file access options

  • [ ] cf/data/array/netcdfarray.py:1:# REVIEW: h5: NetCDFArray: Replaced by NetCDF4Array

  • [ ] cf/data/array/cfah5netcdfarray.py:1:# REVIEW: h5: CFAH5netcdfArray: New class for accessing CFA with h5netcdf

  • [ ] cf/data/array/locks.py:1:# REVIEW: h5: locks.py: New module to provide file locks

  • [ ] cf/data/array/netcdf4array.py:1:# REVIEW: h5: NetCDF4Array: New class to access netCDF with netCDF4, replaces NetCDFArray

  • [ ] cf/data/array/umarray.py:15: # REVIEW: h5: __init__: replace units/calendar API with attributes

  • [ ] cf/data/array/cfanetcdf4array.py:1:# REVIEW: h5: CFAnetCDF4Array: New class for accessing CFA with netCDF4

  • [ ] cf/data/array/fullarray.py:19: # REVIEW: h5: __init__: replace units/calendar API with attributes

  • [ ] cf/data/fragment/init.py:3:# REVIEW: h5: __init__.py: import H5netcdfFragmentArray
  • [ ] cf/data/fragment/init.py:7:# REVIEW: h5: __init__.py: import NetCDF4FragmentArray

  • [ ] cf/data/fragment/fullfragmentarray.py:12: # REVIEW: h5: __init__: replace units/calendar API with attributes

  • [ ] cf/data/fragment/netcdffragmentarray.py:27: # REVIEW: h5: __init__: replace units/calendar API with attributes

  • [ ] cf/data/fragment/h5netcdffragmentarray.py:5:# REVIEW: h5: H5netcdfFragmentArray: New class to access netCDF fragment with h5netcdf

  • [ ] cf/data/fragment/umfragmentarray.py:12: # REVIEW: h5: __init__: replace units/calendar API with attributes

  • [ ] cf/data/fragment/umfragmentarray.py:13: # REVIEW: h5: __init__: new keyword 'storage_options'

  • [ ] cf/data/fragment/netcdf4fragmentarray.py:5:# REVIEW: h5: NetCDF4FragmentArray: New class to access netCDF fragment with netCDF4

  • [ ] cf/functions.py:2900: # REVIEW: h5: relpath: minor refactor
  • [ ] cf/functions.py:2939: # REVIEW: h5: relpath: minor refactor
  • [ ] cf/functions.py:2979: # REVIEW: h5: relpath: minor refactor
  • [ ] cf/functions.py:3350:# REVIEW: h5: environment: new dependencies

  • [ ] cf/cfimplementation.py:30:# REVIEW: h5: cfimplementation.py: import CFAH5netcdfArray, CFANetCDF4Array, H5netcdfArray,NetCDF4Array
  • [ ] cf/cfimplementation.py:119: # REVIEW: h5: initialise_CFANetCDF4Array: new method to initialise CFANetCDF4Array
  • [ ] cf/cfimplementation.py:136: # REVIEW: h5: initialise_CFAH5netcdfArray: new method to initialise CFAH5netcdfArray

  • [ ] cf/regrid/regridoperator.py:730: # REVIEW: h5: new name and location of file lock
  • [ ] cf/regrid/regrid.py:2469: # REVIEW: h5: new name and location of file lock

Active storage

  • [ ] cf/test/test_functions.py:50: # REVIEW: active: test_configuration: test cf.active_storage, cf.active_storage_url, cf.active_storage_max_requests

  • [ ] cf/data/collapse/collapse_active.py:1:# REVIEW: active: collapse_active.py: new module for active storage functionality

  • [ ] cf/data/collapse/dask_collapse.py:1:# REVIEW: active: dask_collapse.py: all unlabelled changes in this module are general tidying, and should be reviewed at the same time as active storage
  • [ ] cf/data/collapse/dask_collapse.py:235:# REVIEW: active: cf_mean_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:382:# REVIEW: active: cf_max_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:537:# REVIEW: active: cf_min_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:644:# REVIEW: active: cf_range_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:758:# REVIEW: active: cf_rms_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:843:# REVIEW: active: cf_sample_size_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:957:# REVIEW: active: cf_sum_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:1093:# REVIEW: active: cf_sum_of_weights_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:1137:# REVIEW: active: cf_sum_of_weights2_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:1183:# REVIEW: active: cf_unique_chunk: active storage decoration
  • [ ] cf/data/collapse/dask_collapse.py:1248:# REVIEW: active: cf_var_chunk: active storage decoration

  • [ ] cf/data/array/mixin/init.py:1:# REVIEW: active: __init__.py: import ActiveStorageMixin

  • [ ] cf/data/array/mixin/arraymixin.py:21: # REVIEW: active: _meta: Moved to here from FileArrayMixin

  • [ ] cf/data/array/mixin/activestoragemixin.py:1:# REVIEW: active: ActiveStorageMixin: new mixin class ActiveStorageMixin

  • [ ] cf/data/creation.py:86: # REVIEW: active: to_dask: '_dask_meta' renamed to '_meta' for consistency with Dask

  • [ ] cf/data/utils.py:866: # REVIEW: active: collapse: pass the active storage status onto the collapse functions
  • [ ] cf/data/utils.py:995: # REVIEW: active: parse_weights: minor refactor

  • [ ] cf/functions.py:166:# REVIEW: active: configuration: new keywords 'active_storage', 'active_storage_url', 'active_storage_max_requests'
  • [ ] cf/functions.py:428:# REVIEW: active: _configuration: new keywords 'active_storage', 'active_storage_url', 'active_storage_max_requests'
  • [ ] cf/functions.py:587:# REVIEW: active: regrid_logging: new examples
  • [ ] cf/functions.py:719:# REVIEW: active: relaxed_identities: new examples
  • [ ] cf/functions.py:853:# REVIEW: active: tempdir: new examples
  • [ ] cf/functions.py:1205:# REVIEW: active: active_storage: new function
  • [ ] cf/functions.py:1275:# REVIEW: active: active_storage_url: new function
  • [ ] cf/functions.py:1340:# REVIEW: active: active_storage_max_requests: new function

  • [ ] cf/field.py:5211: # REVIEW: active: active storage docstring
  • [ ] cf/field.py:7092: # REVIEW: active: collapse: include size 1 axes in collapse

  • [ ] cf/constants.py:66: # REVIEW: active: CONSTANTS: new constants 'active_storage', 'active_storage_url', 'active_storage_max_requests'

davidhassell avatar Aug 22 '24 12:08 davidhassell

From the review:

ERROR: test_configuration (test_functions.functionTest.test_configuration)

Traceback (most recent call last): File "/home/slb93/git-repos/cf-python/cf/functions.py", line 1265, in _parse from activestorage import Active # noqa: F401 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ModuleNotFoundError: No module named 'activestorage'

Now nicely handled: 050913581485d628a4bce914bd1da1b0c22e638f

davidhassell avatar Oct 22 '24 09:10 davidhassell

From the https://github.com/NCAS-CMS/cf-python/pull/805#pullrequestreview-2373806503:

I get eight test_Field_regrid_* tests failing on these assertions:

These test all pass for me ... I wonder what's going on, here?

davidhassell avatar Oct 22 '24 09:10 davidhassell

From the https://github.com/NCAS-CMS/cf-python/pull/805#pullrequestreview-2373806503:

====================================================================== ERROR: test_GENERAL (test_general.generalTest.test_GENERAL)

File "/home/slb93/git-repos/cfdm/cfdm/read_write/netcdf/netcdfwrite.py", line 2476, in _createVariable g["nc"][ncvar] = g["netcdf"].createVariable(**kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "src/netCDF4/_netCDF4.pyx", line 2967, in netCDF4._netCDF4.Dataset.createVariable File "src/netCDF4/_netCDF4.pyx", line 4305, in netCDF4._netCDF4.Variable.init ValueError: chunksize cannot exceed dimension size

This, it turns out, is a bug in cfdm relating to the setting of HDF5 chunksizes for scalar arrays, which shall be fixed over there ...

EDIT: No, it's a bug in cf-python, afterall! Fixed in b72c17bde48bb5f9f7f5e0d60ebbca2a0f29195e. (Note that all will get refactored (for the better) when we daskify cfdm.)

davidhassell avatar Oct 22 '24 09:10 davidhassell

From the https://github.com/NCAS-CMS/cf-python/pull/805#pullrequestreview-2373806503:

Consistency

This wasn't introduced in this PR, but when checking the inheritance structure I noticed that one of the collapse methods has an inconsistent argument ordering relative to the others namely means_abs having weights before axis unlike all others which accept weights (see below), so it would be good to make that consistent here.

I agree. That's one for 4.0.0 perhaps (as it would be a breaking API change). Do you agree?

davidhassell avatar Oct 22 '24 09:10 davidhassell

From the https://github.com/NCAS-CMS/cf-python/pull/805#pullrequestreview-2373806503:

Names wise, I would prefer we rename the classes with H5netcdf in to H5NetCDF which follows the others in using 'NetCDF' capitalisation and follow Python Pascal Case style:

I would agree with you in general! Here, h5netcdf and netCDF4 are proper nouns, being the names of libraries, so I in this case I would say that they camel case to H5netcdf and NetCDF4 respectively.

davidhassell avatar Oct 22 '24 09:10 davidhassell

From https://github.com/NCAS-CMS/cf-python/pull/805#pullrequestreview-2382057240:

More generally, there are a lot of methods missing from the new Array classes API docs:

Fixed: 885a67d44b658b071e971beea288a0935b782275

./check_docs_api_coverage now only flags up the 8 methods that we expect (relating to UGRID, and we are still deciding what do about!)

davidhassell avatar Oct 23 '24 08:10 davidhassell

Hi Sadie - I think I've responding to everything - over to you :)

davidhassell avatar Oct 23 '24 09:10 davidhassell

Thanks for addressing my feedback. I'll reply to your response comments here - though in general if I have put the 'thumbs up' emoji on a comment I am happy and won't comment further!

These test all pass for me ... I wonder what's going on, here?

This time round (after fetching your new commits) they all pass, so either something fixed them or we can treat them as flaky and ignore those failures. Flaky tests are of course a problem in themselves, but not to concern us for this PR. :)

EDIT: No, it's a bug in cf-python, afterall! Fixed in https://github.com/NCAS-CMS/cf-python/commit/b72c17bde48bb5f9f7f5e0d60ebbca2a0f29195e. (Note that all will get refactored (for the better) when we daskify cfdm.)

Great, always good to fix a bug! test_general now all passes for me locally.

I agree. That's one for 4.0.0 perhaps (as it would be a breaking API change). Do you agree?

Sure, shall I open an Issue for it and add it to #776?

./check_docs_api_coverage now only flags up the 8 methods that we expect (relating to UGRID, and we are still deciding what do about!)

Nice, thanks for sorting the rest. Do you know what the '6 missing .rst files' warning is about?:

$ ./check_docs_api_coverage                                                           ─╯
Method cf.Field.del_mesh_id not in docs/source/class/cf.Field.rst
Method cf.Field.get_mesh_id not in docs/source/class/cf.Field.rst
Method cf.Field.has_mesh_id not in docs/source/class/cf.Field.rst
Method cf.Field.set_mesh_id not in docs/source/class/cf.Field.rst
Method cf.Domain.del_mesh_id not in docs/source/class/cf.Domain.rst
Method cf.Domain.get_mesh_id not in docs/source/class/cf.Domain.rst
Method cf.Domain.has_mesh_id not in docs/source/class/cf.Domain.rst
Method cf.Domain.set_mesh_id not in docs/source/class/cf.Domain.rst
File {rst_file} does not exist
File {rst_file} does not exist
File {rst_file} does not exist
File {rst_file} does not exist
File {rst_file} does not exist
File {rst_file} does not exist
Traceback (most recent call last):
  File "/home/slb93/git-repos/cf-python/docs/source/check_docs_api_coverage.py", line 78, in <module>
    raise ValueError(
ValueError: Found 8 undocumented methods and 6 missing .rst files

It may not concern additions from this PR, and can always be sorted at release-time, but in case it is to do with the PR it would be good to do a little investigation.

I would agree with you in general! Here, h5netcdf and netCDF4 are proper nouns, being the names of libraries, so I in this case I would say that they camel case to H5netcdf and NetCDF4 respectively.

Fair point. In which case, they can stay that way, no problem. I respect a grammatical justification!

As for the in-line comments, I have a small number left to respond to, and I will mark a small number you migjt have missed as well. Then we should be good to merge, though I'll approve the PR again to confirm :)

sadielbartholomew avatar Oct 23 '24 15:10 sadielbartholomew

I have responded to everything now, back to you @davidhassell...

sadielbartholomew avatar Oct 23 '24 15:10 sadielbartholomew

Hi Sadie - all done, now, I think!

I'm not sure about:

/home/slb93/git-repos/cf-python/cf/docstring/docstring.py:306: SyntaxWarning: invalid escape sequence '\_'
  "{{split_every: `int` or `dict`, optional}}": """split_every: `int` or `dict`, optional

as I don't see it when I run pre-commit, I'm using Python 3.12.2 - what about you?

davidhassell avatar Oct 28 '24 13:10 davidhassell

Excellent, thanks @davidhassell.

I don't see it when I run pre-commit, I'm using Python 3.12.2 - what about you?

I am using 3.12.0. No worries, we can investigate separately since it points to a line you didn't touch so is very likely unrelated to this PR. Everything is ready I think so please merge!

sadielbartholomew avatar Oct 29 '24 09:10 sadielbartholomew

A great many thanks to @sadielbartholomew for her customary careful and insightful review of this the huge PR. Much appreciated. Merging now.

davidhassell avatar Oct 30 '24 08:10 davidhassell