segfault when loading array
The following code causes a segfault on my machine:
import numpy as np
import asdf
asdf.AsdfFile({'data':np.arange(100)}).write_to('tmp.asdf')
with asdf.open('tmp.asdf') as f:
d = f.tree['data'][:]
print(d)
I'm on Python 3.8, Numpy 1.19.4, asdf 2.7.1 from pip (fresh conda env). Also fails out of asdf master. The segfault happens on the last line, but only when trying to force a non-lazy load with [:].
I'm probably abusing asdf with this syntax! I was experimenting to see if I could use [:] as shorthand for lazy_load=False, copy_arrays=True. But I think calling copy() achieves the same thing (possibly at the minor expense of a temporary, extra copy?). Or I should stop being lazy and just use the proper arguments!
I think we can do better than a segfault here, but [:] just gives you a view over the same memory mapped ndarray, so it also loses access to the data when the file is closed. copy() is probably your best option -- I think that will only result in one copy of the ndarray in process memory, since the original ndarray lives in the page cache.
Let's keep this issue open so that we remember to replace the segfault with a reasonable exception.
The case is handled when it's not a view.
In [10]: import numpy as np
...: import asdf
...:
...: asdf.AsdfFile({'data':np.arange(100)}).write_to('tmp.asdf')
...: with asdf.open('tmp.asdf') as f:
...: d = f.tree['data']
...:
In [11]: print(d)
<array (unloaded) shape: [100] dtype: int64>
In [13]: d += 1
---------------------------------------------------------------------------
OSError Traceback (most recent call last)
<ipython-input-13-2237cfe673cf> in <module>
----> 1 d += 1
~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/tags/core/ndarray.py in __operation__(self, *args)
517 def _make_operation(name):
518 def __operation__(self, *args):
--> 519 return getattr(self._make_array(), name)(*args)
520 return __operation__
521
~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/tags/core/ndarray.py in _make_array(self)
266
267 self._array = np.ndarray(
--> 268 shape, dtype, block.data,
269 self._offset, self._strides, self._order)
270 self._array = self._apply_mask(self._array, self._mask)
~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/block.py in data(self)
1167 if self._data is None:
1168 if self._fd.is_closed():
-> 1169 raise IOError(
1170 "ASDF file has already been closed. "
1171 "Can not get the data.")
OSError: ASDF file has already been closed. Can not get the data.
So handling the case where it is view is the issue it seems?
One of the astropy maintainers pointed out a technique they use for safely closing mmaps:
https://github.com/astropy/astropy/blob/bb4c1973faffea88edc9068df6e95d4452e82928/astropy/io/fits/file.py#L405-L417
Maybe useful for asdf?