python-blosc2 icon indicating copy to clipboard operation
python-blosc2 copied to clipboard

Feature request: `concatenate` for `NDArray` (C-API/Python) without decompression

Open rivershah opened this issue 11 months ago • 0 comments

Request native concatenate functionality for blosc2.NDArray, exposed in both the C-API and the Python wrapper.

Requirements:

  • C-API: Provide a C function to concatenate compressed b2nd along a specified axis.
  • Python API: Wrap the C function, mimicking the numpy.concatenate signature (sequence of arrays, axis parameter). Ref: NumPy Docs
  • Core Constraint: Must operate directly on compressed data, avoiding full decompression/recompression.
  • Assumptions: Input arrays share identical compression settings (codec, clevel, cparams, etc.).

Use Case:

Efficiently join large, pre-compressed datasets in both low-level C applications and Python without the performance penalty of decompression/recompression. Exposing it in the C-API is key for broader integration.

blosc2.SChunk.insert_chunk is very fast. Can this feature use this, along with metadata updates?

Thanks for considering this.

rivershah avatar Apr 11 '25 12:04 rivershah