sedona icon indicating copy to clipboard operation
sedona copied to clipboard

RS_FromNetCDF fails on NetCDF4/HDF5 with filter type=32015 (e.g. ZSTD)

Open BruAPAHE opened this issue 7 months ago • 2 comments

I encountered a runtime failure when using RS_FromNetCDF(content, 'v') on a valid NetCDF4 file stored in S3.

The error is:

Exception occurred while evaluating expression RS_FromNetCDF - inputs: [[B@xxxx, v], cause: Unknown filter type=32015

This happens because the file uses an HDF5 compression filter (likely BLOSC, filter ID 32015), which is not supported by the version of netcdf-java (4.6.11) used in netcdfAll.

Steps to reproduce:

Load .nc file containing BLOSC/ZSTD compression into Spark via binaryFile.

Try to evaluate RS_FromNetCDF(content, 'variable').

A workaround is to re-encode the file using nccopy or xarray to remove unsupported filters, but this limitation is undocumented and leads to runtime crashes.

Expected:

An informative exception or warning.

Ideally: support for filter 32015 

Environment:

Sedona: 1.7.1

Spark: 3.5.0

File: NetCDF4/HDF5 with ZSTD filter

BruAPAHE avatar May 21 '25 15:05 BruAPAHE

Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better.

github-actions[bot] avatar May 21 '25 15:05 github-actions[bot]

@BruAPAHE Thanks for reporting. I believe recently netcdf-java introduced this support. Do you want to create a PR to upgrade the dependency?

jiayuasu avatar May 21 '25 16:05 jiayuasu