asdf icon indicating copy to clipboard operation
asdf copied to clipboard

segfault when loading array

Open lgarrison opened this issue 5 years ago • 3 comments

The following code causes a segfault on my machine:

import numpy as np
import asdf

asdf.AsdfFile({'data':np.arange(100)}).write_to('tmp.asdf')
with asdf.open('tmp.asdf') as f:
  d = f.tree['data'][:]
print(d)

I'm on Python 3.8, Numpy 1.19.4, asdf 2.7.1 from pip (fresh conda env). Also fails out of asdf master. The segfault happens on the last line, but only when trying to force a non-lazy load with [:].

I'm probably abusing asdf with this syntax! I was experimenting to see if I could use [:] as shorthand for lazy_load=False, copy_arrays=True. But I think calling copy() achieves the same thing (possibly at the minor expense of a temporary, extra copy?). Or I should stop being lazy and just use the proper arguments!

lgarrison avatar Nov 05 '20 23:11 lgarrison

I think we can do better than a segfault here, but [:] just gives you a view over the same memory mapped ndarray, so it also loses access to the data when the file is closed. copy() is probably your best option -- I think that will only result in one copy of the ndarray in process memory, since the original ndarray lives in the page cache.

Let's keep this issue open so that we remember to replace the segfault with a reasonable exception.

eslavich avatar Nov 06 '20 00:11 eslavich

The case is handled when it's not a view.

In [10]: import numpy as np
    ...: import asdf
    ...: 
    ...: asdf.AsdfFile({'data':np.arange(100)}).write_to('tmp.asdf')
    ...: with asdf.open('tmp.asdf') as f:
    ...:   d = f.tree['data']
    ...: 

In [11]: print(d)
<array (unloaded) shape: [100] dtype: int64>

In [13]: d += 1
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-13-2237cfe673cf> in <module>
----> 1 d += 1

~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/tags/core/ndarray.py in __operation__(self, *args)
    517 def _make_operation(name):
    518     def __operation__(self, *args):
--> 519         return getattr(self._make_array(), name)(*args)
    520     return __operation__
    521 

~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/tags/core/ndarray.py in _make_array(self)
    266 
    267             self._array = np.ndarray(
--> 268                 shape, dtype, block.data,
    269                 self._offset, self._strides, self._order)
    270             self._array = self._apply_mask(self._array, self._mask)

~/miniconda3/envs/jwst/lib/python3.8/site-packages/asdf/block.py in data(self)
   1167         if self._data is None:
   1168             if self._fd.is_closed():
-> 1169                 raise IOError(
   1170                     "ASDF file has already been closed. "
   1171                     "Can not get the data.")

OSError: ASDF file has already been closed. Can not get the data.

So handling the case where it is view is the issue it seems?

jdavies-st avatar Nov 06 '20 14:11 jdavies-st

One of the astropy maintainers pointed out a technique they use for safely closing mmaps:

https://github.com/astropy/astropy/blob/bb4c1973faffea88edc9068df6e95d4452e82928/astropy/io/fits/file.py#L405-L417

Maybe useful for asdf?

eslavich avatar May 07 '21 15:05 eslavich