pynrrd Support `np.memmap`

I am trying to migrate some large datasets to nrrd and indeed pynrrd, and one thing I desire is to have nrrd.read_data support np.memmap. All my data is saved raw without compression, and say I get gigabytes of data in a file, it's much faster for me to do a np.memmap where I'll then read the segment of data I need, rather than loading everything into memory first.

Sep 05 '24 21:09 kwsp

Definitely sounds like a useful feature. Some questions/comments.

Sounds like it should only apply when encoding is raw. Should it apply to ASCII/text encodings? Not sure that makes sense.

Should the memmap be read-only? It's definitely odd to do nrrd.read() and then have a mutable object. I'm particularly concerned about if the size changes (is that possible with memmap?)

I'm leaning towards a separate function like read_memmap instead of adapting read.

I'm also unclear about Fortan vs C-style ordering, but hopefully the order parameter in np.memmap would suffice.

Sep 06 '24 01:09 addisonElliott

@addisonElliott I agree it

should only be used on raw. I don't think its suitable for compressed files as you need to read all into memory to decompress and then access the data right? Personally I don't use any compression because the compression factor on my data is negligible and it slows down IO by an order of magnitude.
should be read-only for the file, but we can actually use the "copy on write" mode (use np.memmap(..., mode="c")), this way values can be modified in memory but not to file. I think this more closely matches the behavior of nrrd.read which just reads into memory. Or just read-only np.memmap(..., mode="r") to reduce the cognitive complexity.
probably warrants a separate function like read_memmap
Could add an optional parameter to nrrd.read(memmap=True) that's only used when "raw"
I'm pretty sure the order parameter in np.memmap would suffice.

Sep 06 '24 17:09 kwsp