echopype icon indicating copy to clipboard operation
echopype copied to clipboard

Unable to combine .nc files using ep.combine_echodata()

Open tkeffer opened this issue 4 months ago • 2 comments

I am probably doing something stupid, but I have been unable to figure this one out. My goal is to combine several netCDF files into one, but I've been getting an error:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/tkeffer/git/westernflyer/ek80/issue.py", line 10, in <module>
    combined_ed.to_netcdf("test.nc")
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/echopype/echodata/echodata.py", line 612, in to_netcdf
    return to_file(
           ^^^^^^^^
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/echopype/convert/api.py", line 88, in to_file
    _save_groups_to_file(
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/echopype/convert/api.py", line 118, in _save_groups_to_file
    io.save_file(
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/echopype/utils/io.py", line 72, in save_file
    ds.to_netcdf(path=path, mode=mode, group=group, encoding=encoding, **kwargs)
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/xarray/core/dataset.py", line 2102, in to_netcdf
    return to_netcdf(  # type: ignore[return-value]  # mypy cannot resolve the overloads:(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/xarray/backends/api.py", line 2107, in to_netcdf
    dump_to_store(
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/xarray/backends/api.py", line 2157, in dump_to_store
    store.store(variables, attrs, check_encoding, writer, unlimited_dims=unlimited_dims)
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/xarray/backends/common.py", line 529, in store
    self.set_variables(
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/xarray/backends/common.py", line 567, in set_variables
    target, source = self.prepare_variable(
                     ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 582, in prepare_variable
    encoding = _extract_nc4_variable_encoding(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tkeffer/git/westernflyer/ek80/venv/lib/python3.12/site-packages/xarray/backends/netCDF4_.py", line 311, in _extract_nc4_variable_encoding
    raise ValueError(
ValueError: unexpected encoding parameters for 'netCDF4' backend: ['szip', 'zstd', 'bzip2', 'blosc', 'preferred_chunks']. Valid encodings are: {'fletcher32', 'zlib', '_FillValue', 'szip_coding', 'least_significant_digit', 'chunksizes', 'endian', 'shuffle', 'significant_digits', 'contiguous', 'blosc_shuffle', 'dtype', 'quantize_mode', 'szip_pixels_per_block', 'complevel', 'compression'}

Environment:

Ubuntu 24.04 Python 3.12.5 echopype 0.10.1 pandas: 2.3.2 numpy: 1.26.4 xarray: 2025.9.0 Sonar: EK80

To reproduce

Sample data set

This historical sample data set (about 300 MB) will reproduce the problem, but I have had the same problem with my own data.

wget https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Bell_M._Shimada/SH2209/EK80/Express-D20220904-T011422.raw https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Bell_M._Shimada/SH2209/EK80/Express-D20220904-T012014.raw https://noaa-wcsd-pds.s3.amazonaws.com/data/raw/Bell_M._Shimada/SH2209/EK80/Express-D20220904-T012606.raw

Convert

This was then converted to nc files using this:

import glob

import echopype as ep

raw_files = glob.glob("*.raw")
for raw_file in raw_files:
    ed = ep.open_raw(raw_file, sonar_model="EK80", use_swap=True)
    ed.to_netcdf(raw_file.replace(".raw", ".nc"))

Combine

Then combine the nc files using this:

import glob
import echopype as ep

ed_filenames = sorted(glob.glob("*.nc"))
ed_list = []
for ed_filename in ed_filenames:
    ed_list.append(ep.open_converted(ed_filename))

combined_ed = ep.combine_echodata(ed_list)
combined_ed.to_netcdf("test.nc")

tkeffer avatar Sep 11 '25 00:09 tkeffer

Hi @tkeffer,

Indeed, I can reproduce the error too! I'm not using netcdf format so much with echopype, but if we take it step by step:

  1. In case that helps in the meantime, and you do not necessarily need NetCDF format, you could use Zarr format instead of NetCDF, and it will work to combine them and save a combined file.

  2. If we focus back on the NetCDF problem, to be sure we’re on the same page:

  • After combined_ed = ep.combine_echodata(ed_list) we can still access and plot the data.
ds_Sv_nc = ep.calibrate.compute_Sv(combined_ed, waveform_mode="CW", encode_mode="power")
ds_Sv_nc["Sv"].plot( x="ping_time", row="channel", col_wrap=3, vmin=-80, vmax=-30, cmap="RdYlBu_r", yincrease=False )
  • The problem emerges only at the end when we try to save the multiple .nc files. The error is associated with variable encodings/attributes that the netCDF4 engine doesn’t accept. From what I understand, when EchoData objects are read and combined, some variables end up carrying Zarr-style encoding. These keys are valid for Zarr but invalid for the netCDF4 backend.

  • Separately, it seems that combine_echodata() stamps Provenance.attrs["is_combined"] = True. The netCDF4 library doesn’t have a boolean attribute type, so xarray raises.

I looked a bit but did not find where those encodings/attrs were attached. For now, aside from using the Zarr format, one temporary workaround would be removing these encodings/attributes manually:

import numpy as np

# 1) wipe all per-variable encodings so xarray won't see Zarr-ish keys
for grp in sorted(combined_ed.group_paths):
    path = "/" if grp == "Top-level" else grp
    try:
        ds = combined_ed._tree[path].ds
    except KeyError:
        continue
    for v in ds.variables:
        ds[v].encoding.clear()   # <- removes 'blosc', 'zstd', 'preferred_chunks', etc.
    # also fix any boolean attrs (netCDF4 can't store bool attrs)
    for k, v in list(ds.attrs.items()):
        if isinstance(v, (bool, np.bool_)):
            ds.attrs[k] = int(v)

# 2) write
combined_ed.to_netcdf("./data/combined_nc/combined.nc", overwrite=True)

This still needs further investigation, as the encodings appear even on a single .nc file, without the combine_echodata() call.

Hope it helps for now!

LOCEANlloydizard avatar Sep 11 '25 15:09 LOCEANlloydizard

For note: seems also connected to those issues #1092 #479 #975 and to this PR #1042

LOCEANlloydizard avatar Sep 11 '25 15:09 LOCEANlloydizard