Support `np.memmap`
I am trying to migrate some large datasets to nrrd and indeed pynrrd, and one thing I desire is to have nrrd.read_data support np.memmap. All my data is saved raw without compression, and say I get gigabytes of data in a file, it's much faster for me to do a np.memmap where I'll then read the segment of data I need, rather than loading everything into memory first.
Definitely sounds like a useful feature. Some questions/comments.
Sounds like it should only apply when encoding is raw. Should it apply to ASCII/text encodings? Not sure that makes sense.
Should the memmap be read-only? It's definitely odd to do nrrd.read() and then have a mutable object. I'm particularly concerned about if the size changes (is that possible with memmap?)
I'm leaning towards a separate function like read_memmap instead of adapting read.
I'm also unclear about Fortan vs C-style ordering, but hopefully the order parameter in np.memmap would suffice.
@addisonElliott I agree it
- should only be used on
raw. I don't think its suitable for compressed files as you need to read all into memory to decompress and then access the data right? Personally I don't use any compression because the compression factor on my data is negligible and it slows down IO by an order of magnitude. - should be read-only for the file, but we can actually use the "copy on write" mode (use
np.memmap(..., mode="c")), this way values can be modified in memory but not to file. I think this more closely matches the behavior ofnrrd.readwhich just reads into memory. Or just read-onlynp.memmap(..., mode="r")to reduce the cognitive complexity. - probably warrants a separate function like
read_memmap - Could add an optional parameter to
nrrd.read(memmap=True)that's only used when "raw" - I'm pretty sure the
orderparameter innp.memmapwould suffice.