intake-esm icon indicating copy to clipboard operation
intake-esm copied to clipboard

Bug/local kerchunk engine

Open chiaweh2 opened this issue 2 months ago • 6 comments

Change Summary

This pull request improves support for opening kerchunk datasets with the kerchunk engine, particularly distinguishing between local and remote reference files. It also adds new tests to ensure the correct behavior for both remote HTTPS URLs and local files when using the kerchunk engine.

Related issue number

Related to the PR #758. This fix bug of local access on the PR and also add unittest on both local and remote reference file access using Kerchunk engine.

Checklist

  • [x] Unit tests for the changes exist
  • [ ] Tests pass on CI
  • [ ] Documentation reflects the changes where applicable

Enhancements to dataset opening logic:

  • Updated the _open_dataset function in source.py to avoid using fsspec when loading local reference files while engine set to kerchunk, ensuring correct handling by checking that xarray_open_kwargs['engine'] != 'kerchunk' before using fsspec.open_local.

Expanded test coverage:

  • Added test_open_dataset_kerchunk_engine to verify that remote kerchunk reference files accessed via HTTPS are properly opened with the kerchunk engine.
  • Added test_open_dataset_kerchunk_engine_local to verify that local kerchunk reference files are handled correctly, including testing with specific storage_options for remote S3 access.

chiaweh2 avatar Dec 14 '25 04:12 chiaweh2

Thanks for the quick review! I did add the kerchunk package in the previous PR #758 in requirement.txt. Or is the test environment set somewhere else? Thanks!

chiaweh2 avatar Dec 15 '25 16:12 chiaweh2

Thanks for the quick review! I did add the kerchunk package in the previous PR #758 in requirement.txt. Or is the test environment set somewhere else? Thanks!

It needs to included here https://github.com/intake/intake-esm/blob/main/ci/environment-upstream-dev.yml

As in the testing infrastructure, it does not install all the dependencies in the requirements.txt file

mgrover1 avatar Dec 15 '25 16:12 mgrover1

Looks like python3.10 is in conflict with the kerchunk version for the python3.10 CI test.

chiaweh2 avatar Dec 15 '25 17:12 chiaweh2

Looks like python3.10 is in conflict with the kerchunk version for the python3.10 CI test.

can we drop python3.10 in a separate PR and close this issue

  • https://github.com/intake/intake-esm/issues/762

?

andersy005 avatar Dec 15 '25 18:12 andersy005

@mgrover1 @andersy005 just checking to see do I need to do anything for the failed testing? Many thanks!

chiaweh2 avatar Dec 17 '25 19:12 chiaweh2

@mgrover1 @andersy005 just checking to see do I need to do anything for the failed testing? Many thanks!

correct! I am planning on submitting a PR tonight with the fix; then we can just merge the main branch in here

mgrover1 avatar Dec 17 '25 19:12 mgrover1