asdf icon indicating copy to clipboard operation
asdf copied to clipboard

Ability to read arrays into existing memory

Open lgarrison opened this issue 4 years ago • 3 comments

I have a data model where an array is divided over multiple ASDF files. Sometimes one wants to work on files (pieces of the array) one at a time, but often one wants to load multiple files into a contiguous array in memory. Right now, I just read the files one-by-one, then concatenate them, but I could save a few seconds if ASDF could read arrays into a pre-allocated array, maybe with an out parameter, like:

a = np.empty(N)
i = 0
for fn in fns:
    with AsdfFile(fn) as af:
        nread = af['data'].read(out=a[i:])
    i += nread

Do you think something like this would be possible?

lgarrison avatar Aug 04 '21 16:08 lgarrison

I don't see why not, but I'll wait for Ed to return to see if he sees any issues with that (he's on vacation until the end of next week).

perrygreenfield avatar Aug 04 '21 16:08 perrygreenfield

I'm not opposed to the idea, but I will point out some potential pitfalls:

  • The NDArrayType object is a proxy for the underlying np.ndarray, so it has limited public instance methods to avoid conflicting with np.ndarray methods. There isn't currently a read method on np.ndarray so this isn't a problem now but it may be in the future.

  • When a user adds an np.ndarray to an AsdfFile's tree, the object is not automatically converted to NDArrayType on assignment. It's not until the file is written and re-read that the object becomes NDArrayType, which means the new method won't be available immediately. Seems confusing to have an interface that changes depending on how the AsdfFile object was created...

eslavich avatar Aug 16 '21 17:08 eslavich

In that case, perhaps it could be a method of the AsdfFile or the tree, rather than the NDArrayType? Such as:

with asdf.open(fn) as af:
    tupleofkeys = ('data','a','b')  # if the data is at af.tree['data']['a']['b']
    nread = af.tree.read(tupleofkeys, out=a[i:])  # or af.tree.load()?    

This could also be aliased by NDArrayType.read()/load(), to provide a shorthand for cases where you know you're reading a file from disk.

lgarrison avatar Aug 16 '21 18:08 lgarrison