asdf icon indicating copy to clipboard operation
asdf copied to clipboard

Interface inconsistency for methods `write()` and `update()` in AsdfFile

Open NumesSanguis opened this issue 6 years ago • 3 comments

Situation

My situation is where process 1 saves a dict with numpy array (representing a 2-channel audio file) to an .asdf file. Then process 2 (at an arbitrary later time) wants to open it, add an element, and write that back to the file. It seems that AsdfFile.update would be the correct function for that, but in the documentation I cannot find how we're supposed to use it. I only found it mentioned twice (without any details on how to use it):

  • https://asdf.readthedocs.io/en/latest/asdf/arrays.html#saving-inline-arrays
  • https://asdf.readthedocs.io/en/latest/asdf/features.html#schema-validation

Attempt

How I imagine it being used:

import numpy as np
import asdf

# sample data
np_arr0 = np.arange(200000, dtype=np.float32).reshape(2, -1)  # (2, 100000)
np_arr1 = np.arange(200000, 360000, dtype=np.float32).reshape(2, -1)  # (2, 80000)

# You can also make the AsdfFile first, and modify its tree directly:
ff = asdf.AsdfFile()
ff[0] = np_arr0
ff.write_to("test_append.asdf")  # write to ASDF formatted .yaml

# Attempt 1: expand dict
ff_append = asdf.open("test_append.asdf", mode='rw')
ff_append[1] = np_arr1
ff_append.update("test_append.asdf")  # write update to file

# Attempt 2: same code, but using 'with' statement
with asdf.open("test_append.asdf", mode='rw') as ff_append:
    ff_append[1] = np_arr1
    ff_append.update("test_append.asdf")

However, both results in the error: ValueError: Invalid value for all_array_storage: 'test_append.asdf'

Full error traceback (CLICK ME)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-c5fd0dea2ed6> in <module>
----> 1 ff_append.update("test_append.asdf")

/opt/miniconda3/envs/audio_tester/lib/python3.7/site-packages/asdf/asdf.py in update(self, all_array_storage, all_array_compression, auto_inline, pad_blocks, include_block_index, version)
    918 
    919         self._pre_write(fd, all_array_storage, all_array_compression,
--> 920                         auto_inline)
    921 
    922         try:

/opt/miniconda3/envs/audio_tester/lib/python3.7/site-packages/asdf/asdf.py in _pre_write(self, fd, all_array_storage, all_array_compression, auto_inline)
    777             raise ValueError(
    778                 "Invalid value for all_array_storage: '{0}'".format(
--> 779                     all_array_storage))
    780         self._all_array_storage = all_array_storage
    781 

ValueError: Invalid value for all_array_storage: 'test_append.asdf'

Questions

  1. How do I extend an existing .asdf file?
  2. Can asdf.Stream be used with arbitrary sizes (e.g. np_arr0 and np_arr1 in my example)? The documentation mentions only the case with entry size known.

System

  • OS: Ubuntu 18.04
  • Filesystem: Local and/or NAS
import platform; print(platform.python_version())  # Python: 3.7.5
import numpy as np; print(np.__version__)  # 1.17.3
import asdf; print(asdf.__version__)  # 2.4.2

NumesSanguis avatar Dec 12 '19 02:12 NumesSanguis

@NumesSanguis I'm at an out of town meeting this week but will try to get you an answer early next week if someone else at STScI doesn't answer it sooner.

perrygreenfield avatar Dec 12 '19 14:12 perrygreenfield

@NumesSanguis I wasn't sure how you were attempting to append. Normally, I would append a new array to the tree so I have the following example that does something similar:

import numpy as np
import asdf
# this assumes you want the arrays in a list structure
ff.tree["data"] = [np.arange(200000, dtype=np.float32).reshape(2, -1)]
ff.write_to("test_append.asdf")
ff.close()
ff_append = asdf.open("test_append.asdf", mode="rw")
ff_append.tree["data"].append(np.arange(200000, dtype=np.float32).reshape(2, -1))
ff_append.update()
ff_append.close()
ff = asdf.open("test_append.asdf")
print(ff.tree["data"][1].shape)

should work. Note that if you are updating, you don't need to supply the filename again. Does this work for you?

perrygreenfield avatar Dec 12 '19 14:12 perrygreenfield

Thanks, this works! I had the wrong concept that asdf.open("test_append.asdf", mode="rw") gives me a Python dict. Instead it gives me a asdf.AsdfFile() object, which already internally has the path set (with fd = self._fd).

I think what was confusing to me is that asdf.AsdfFile() is mostly an object-orientated approach, however write_to is more functional. While both write_to() and update() are doing a similar operation, their interface is different. To me it would be more consistent if the functions work like this (set the path in the object, instead of write function):

# create object, then write data
ff = asdf.AsdfFile(fd="test.asdf")  # fd is optional
ff[0] = np_arr0
ff.write()
# continue modifying tree
ff[1] = np_arr0
ff.update()
ff.close()
# update path and write another file
ff.file_destination("foo.asdf")  # internally update self._fd
ff.write()

# this also allows for the same pattern for reading with a new value 'w' (deletes file if exists)
with asdf.open("spam.asdf", mode='w') as ff_w:
    ff_w[0] = np_arr0
    ff_w.write()
    ff_w[1] = np_arr1
    ff_w.update()

This would also streamline the internals of AsdfFile() after .write() and after open():

ff = asdf.AsdfFile()
ff[0] = np_arr0
# Expectation: keep reference to written file destination
ff.write_to("test_append.asdf")

ff[1] = np_arr1
ff.update()
# Reality: ValueError: Can not update, since there is no associated file
Full traceback(CLICK ME)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-1a1b47761739> in <module>
      1 ff[1] = np_arr1
----> 2 ff.update()

/opt/miniconda3/envs/audio_tester/lib/python3.7/site-packages/asdf/asdf.py in update(self, all_array_storage, all_array_compression, auto_inline, pad_blocks, include_block_index, version)
    893         if fd is None:
    894             raise ValueError(
--> 895                 "Can not update, since there is no associated file")
    896 
    897         if not fd.writable():

ValueError: Can not update, since there is no associated file

In this approach, the .write() method, could still have the argument fd, but it would be optional. For consistency, .update() would also accept a fd argument. These would have the same functionality as .file_destination().


Edit: To prevent the need of a new function .file_destination() that only updates ._fd, .write() could still have fd as optional argument, but inside there is a check that ._fd cannot be None if this function is invoked.

NumesSanguis avatar Dec 13 '19 06:12 NumesSanguis