Ability to read arrays into existing memory
I have a data model where an array is divided over multiple ASDF files. Sometimes one wants to work on files (pieces of the array) one at a time, but often one wants to load multiple files into a contiguous array in memory. Right now, I just read the files one-by-one, then concatenate them, but I could save a few seconds if ASDF could read arrays into a pre-allocated array, maybe with an out parameter, like:
a = np.empty(N)
i = 0
for fn in fns:
with AsdfFile(fn) as af:
nread = af['data'].read(out=a[i:])
i += nread
Do you think something like this would be possible?
I don't see why not, but I'll wait for Ed to return to see if he sees any issues with that (he's on vacation until the end of next week).
I'm not opposed to the idea, but I will point out some potential pitfalls:
-
The NDArrayType object is a proxy for the underlying np.ndarray, so it has limited public instance methods to avoid conflicting with np.ndarray methods. There isn't currently a read method on np.ndarray so this isn't a problem now but it may be in the future.
-
When a user adds an np.ndarray to an AsdfFile's tree, the object is not automatically converted to NDArrayType on assignment. It's not until the file is written and re-read that the object becomes NDArrayType, which means the new method won't be available immediately. Seems confusing to have an interface that changes depending on how the AsdfFile object was created...
In that case, perhaps it could be a method of the AsdfFile or the tree, rather than the NDArrayType? Such as:
with asdf.open(fn) as af:
tupleofkeys = ('data','a','b') # if the data is at af.tree['data']['a']['b']
nread = af.tree.read(tupleofkeys, out=a[i:]) # or af.tree.load()?
This could also be aliased by NDArrayType.read()/load(), to provide a shorthand for cases where you know you're reading a file from disk.