`open_dataset` should fail early, before reaching `guess_engine` when file doesn't exist
What happened?
I ran a code in a script from long ago that uses xr.open_dataset, and I gave the script a path to a file that doesn't exist, and got this error:
ValueError: did not find a match in any of xarray's currently installed IO backends ['h5netcdf', 'scipy', 'zarr']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
https://docs.xarray.dev/en/stable/user-guide/io.html
This made me think that maybe xarray got updated and the engine guessing algorithm got different. But apparently I was wrong, and the open_dataset file simply don't exist.
What did you expect to happen?
open_dataset to fail earlier saying there is no file in the given path, and print the path too in the error message.
Minimal Complete Verifiable Example
import xarray as xr
xr.show_versions()
xr.open_dataset("missing.h5")
Steps to reproduce
I cannot use uv to show demonstrate the issue (I use NixOS), but the issue should be simple enough for you to forgive me hopefully. It might be that installing an HDF5 backend is required, but I'm not sure.
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
- [ ] Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
ValueError: did not find a match in any of xarray's currently installed IO backends ['h5netcdf', 'scipy', 'zarr']. Consider explicitly selecting one of the installed engines via the ``engine`` parameter, or installing additional IO dependencies, see:
https://docs.xarray.dev/en/stable/getting-started-guide/installing.html
https://docs.xarray.dev/en/stable/user-guide/io.html
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None python: 3.13.8 (main, Oct 7 2025, 12:01:51) [GCC 14.3.0] python-bits: 64 OS: Linux OS-release: 6.12.56 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_IL LOCALE: ('en_IL', 'ISO8859-1') libhdf5: 1.14.6 libnetcdf: None
xarray: 2025.7.1 pandas: 2.3.1 numpy: 2.3.3 scipy: 1.16.2 netCDF4: None pydap: None h5netcdf: 1.6.4 h5py: 3.14.0 zarr: 3.1.1 cftime: None nc_time_axis: None iris: None bottleneck: None dask: None distributed: None matplotlib: 3.10.5 cartopy: None seaborn: None numbagg: None fsspec: None cupy: None pint: 0.25.0.dev0 sparse: None flox: None numpy_groupies: None setuptools: None pip: None conda: None pytest: None mypy: 1.17.1 IPython: None sphinx: None
Thanks for writing this up @doronbehar! Just to be clear xarray is failing to guess a backend based on the file extension. So failing to guess a backend is not related to the fact that the file doesn't exist. It's just that .h5 doesn't map to any backends. You can check this by trying the following:
import xarray as xr
xr.open_dataset("missing.nc")
You'll notice that the "netcdf4" (or in a different env possible "scipy" or "h5netcdf") backend is guessed and you end up with the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/home/jsignell/xarray/missing.nc'
I agree conceptually that it would be nice to provide a missing file error rather than an "engine not guessed" error. But it's a little weird to try to fail before choosing an engine because different engines support different file locations. One engine might only support local files and another might also support https or s3. So in the abstract it's actually kind of hard to know what it means for a file to "not exist". For the local case it might be fine to add a dedicated error message for if the file doesn't exist, but we wouldn't even want to add that for the remote case because that would require pinging the file which would require an additional request.
It's just that
.h5doesn't map to any backends.
That's surprising, as that seems like an official extension for HDF5 files, according to Wikipedia.
You can check this by trying the following:
That's nice to know this issue is specific at least to .h5 extension.
One engine might only support local files and another might also support
httpsors3. So in the abstract it's actually kind of hard to know what it means for a file to "not exist".
Shouldn't it be pretty easy to detect these URI prefixes in the input string?
but we wouldn't even want to add that for the remote case because that would require pinging the file which would require an additional request.
I agree :).
That's surprising, as that seems like an official extension for HDF5 files
indeed, but netcdf is a subset of HDF5, so it is at least questionable whether netcdf backends should claim to know how to read .h5 files.
Shouldn't it be pretty easy to detect these URI prefixes in the input string?
possibly, but where do you stop? s3 and https are pretty standard, but if we support these, should we also support others?
In general, I think the easiest might be to extend the "failed to guess" error message to something like ... Make sure the file exists and is readable, and consider explicitly selecting one of the installed engines ... (best would be to find a way without extending the sentence, which is already pretty long)
Shouldn't it be pretty easy to detect these URI prefixes in the input string?
Sure , but the broader concept is that it is the responsibility of the engine to handle the input. It doesn't have to be a string at all. It can be an object for instance.
I like @keewis's idea of adding a bit to the error message.
Shouldn't it be pretty easy to detect these URI prefixes in the input string?
Sure , but the broader concept is that it is the responsibility of the engine to handle the input.
Maybe we can detect any URI and in that case let the engine handle it? If the object is not a URI at all then we can check if it is an existing file.
That's surprising, as that seems like an official extension for HDF5 files
indeed, but netcdf is a subset of HDF5, so it is at least questionable whether netcdf backends should claim to know how to read
.h5files.
Hmm I see. I think the netcdf backend should claim it can try to read .h5 files. It will succeed if it is structured as a netcdf file, and will fail with an ugly error otherwise (which is OK IMO).