Interface inconsistency for methods `write()` and `update()` in AsdfFile
Situation
My situation is where process 1 saves a dict with numpy array (representing a 2-channel audio file) to an .asdf file. Then process 2 (at an arbitrary later time) wants to open it, add an element, and write that back to the file. It seems that AsdfFile.update would be the correct function for that, but in the documentation I cannot find how we're supposed to use it. I only found it mentioned twice (without any details on how to use it):
- https://asdf.readthedocs.io/en/latest/asdf/arrays.html#saving-inline-arrays
- https://asdf.readthedocs.io/en/latest/asdf/features.html#schema-validation
Attempt
How I imagine it being used:
import numpy as np
import asdf
# sample data
np_arr0 = np.arange(200000, dtype=np.float32).reshape(2, -1) # (2, 100000)
np_arr1 = np.arange(200000, 360000, dtype=np.float32).reshape(2, -1) # (2, 80000)
# You can also make the AsdfFile first, and modify its tree directly:
ff = asdf.AsdfFile()
ff[0] = np_arr0
ff.write_to("test_append.asdf") # write to ASDF formatted .yaml
# Attempt 1: expand dict
ff_append = asdf.open("test_append.asdf", mode='rw')
ff_append[1] = np_arr1
ff_append.update("test_append.asdf") # write update to file
# Attempt 2: same code, but using 'with' statement
with asdf.open("test_append.asdf", mode='rw') as ff_append:
ff_append[1] = np_arr1
ff_append.update("test_append.asdf")
However, both results in the error: ValueError: Invalid value for all_array_storage: 'test_append.asdf'
Full error traceback (CLICK ME)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-12-c5fd0dea2ed6> in <module>
----> 1 ff_append.update("test_append.asdf")
/opt/miniconda3/envs/audio_tester/lib/python3.7/site-packages/asdf/asdf.py in update(self, all_array_storage, all_array_compression, auto_inline, pad_blocks, include_block_index, version)
918
919 self._pre_write(fd, all_array_storage, all_array_compression,
--> 920 auto_inline)
921
922 try:
/opt/miniconda3/envs/audio_tester/lib/python3.7/site-packages/asdf/asdf.py in _pre_write(self, fd, all_array_storage, all_array_compression, auto_inline)
777 raise ValueError(
778 "Invalid value for all_array_storage: '{0}'".format(
--> 779 all_array_storage))
780 self._all_array_storage = all_array_storage
781
ValueError: Invalid value for all_array_storage: 'test_append.asdf'
Questions
- How do I extend an existing .asdf file?
- Can
asdf.Streambe used with arbitrary sizes (e.g.np_arr0andnp_arr1in my example)? The documentation mentions only the case with entry size known.
System
- OS: Ubuntu 18.04
- Filesystem: Local and/or NAS
import platform; print(platform.python_version()) # Python: 3.7.5
import numpy as np; print(np.__version__) # 1.17.3
import asdf; print(asdf.__version__) # 2.4.2
@NumesSanguis I'm at an out of town meeting this week but will try to get you an answer early next week if someone else at STScI doesn't answer it sooner.
@NumesSanguis I wasn't sure how you were attempting to append. Normally, I would append a new array to the tree so I have the following example that does something similar:
import numpy as np
import asdf
# this assumes you want the arrays in a list structure
ff.tree["data"] = [np.arange(200000, dtype=np.float32).reshape(2, -1)]
ff.write_to("test_append.asdf")
ff.close()
ff_append = asdf.open("test_append.asdf", mode="rw")
ff_append.tree["data"].append(np.arange(200000, dtype=np.float32).reshape(2, -1))
ff_append.update()
ff_append.close()
ff = asdf.open("test_append.asdf")
print(ff.tree["data"][1].shape)
should work. Note that if you are updating, you don't need to supply the filename again. Does this work for you?
Thanks, this works! I had the wrong concept that asdf.open("test_append.asdf", mode="rw") gives me a Python dict. Instead it gives me a asdf.AsdfFile() object, which already internally has the path set (with fd = self._fd).
I think what was confusing to me is that asdf.AsdfFile() is mostly an object-orientated approach, however write_to is more functional. While both write_to() and update() are doing a similar operation, their interface is different.
To me it would be more consistent if the functions work like this (set the path in the object, instead of write function):
# create object, then write data
ff = asdf.AsdfFile(fd="test.asdf") # fd is optional
ff[0] = np_arr0
ff.write()
# continue modifying tree
ff[1] = np_arr0
ff.update()
ff.close()
# update path and write another file
ff.file_destination("foo.asdf") # internally update self._fd
ff.write()
# this also allows for the same pattern for reading with a new value 'w' (deletes file if exists)
with asdf.open("spam.asdf", mode='w') as ff_w:
ff_w[0] = np_arr0
ff_w.write()
ff_w[1] = np_arr1
ff_w.update()
This would also streamline the internals of AsdfFile() after .write() and after open():
ff = asdf.AsdfFile()
ff[0] = np_arr0
# Expectation: keep reference to written file destination
ff.write_to("test_append.asdf")
ff[1] = np_arr1
ff.update()
# Reality: ValueError: Can not update, since there is no associated file
Full traceback(CLICK ME)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-1a1b47761739> in <module>
1 ff[1] = np_arr1
----> 2 ff.update()
/opt/miniconda3/envs/audio_tester/lib/python3.7/site-packages/asdf/asdf.py in update(self, all_array_storage, all_array_compression, auto_inline, pad_blocks, include_block_index, version)
893 if fd is None:
894 raise ValueError(
--> 895 "Can not update, since there is no associated file")
896
897 if not fd.writable():
ValueError: Can not update, since there is no associated file
In this approach, the .write() method, could still have the argument fd, but it would be optional. For consistency, .update() would also accept a fd argument. These would have the same functionality as .file_destination().
Edit: To prevent the need of a new function .file_destination() that only updates ._fd, .write() could still have fd as optional argument, but inside there is a check that ._fd cannot be None if this function is invoked.