RS_FromNetCDF fails on NetCDF4/HDF5 with filter type=32015 (e.g. ZSTD)
I encountered a runtime failure when using RS_FromNetCDF(content, 'v') on a valid NetCDF4 file stored in S3.
The error is:
Exception occurred while evaluating expression RS_FromNetCDF - inputs: [[B@xxxx, v], cause: Unknown filter type=32015
This happens because the file uses an HDF5 compression filter (likely BLOSC, filter ID 32015), which is not supported by the version of netcdf-java (4.6.11) used in netcdfAll.
Steps to reproduce:
Load .nc file containing BLOSC/ZSTD compression into Spark via binaryFile.
Try to evaluate RS_FromNetCDF(content, 'variable').
A workaround is to re-encode the file using nccopy or xarray to remove unsupported filters, but this limitation is undocumented and leads to runtime crashes.
Expected:
An informative exception or warning.
Ideally: support for filter 32015
Environment:
Sedona: 1.7.1
Spark: 3.5.0
File: NetCDF4/HDF5 with ZSTD filter
Thank you for your interest in Apache Sedona! We appreciate you opening your first issue. Contributions like yours help make Apache Sedona better.
@BruAPAHE Thanks for reporting. I believe recently netcdf-java introduced this support. Do you want to create a PR to upgrade the dependency?