devito icon indicating copy to clipboard operation
devito copied to clipboard

devito Data object issues

Open tjb900 opened this issue 4 years ago • 11 comments

Some of the methods of devito.Data rely on attributes set in __new__ and __array_finalize__, when these are not always called.

For example, this code:

from devito import Grid, Function
import pickle

g = Grid((20, 20))
u = Function(name='u', grid=g)

d1 = u.data
d2 = pickle.loads(pickle.dumps(d1))

try:
    # Will fail in array_finalize due to lack of _index_stash
    print(d2)
except Exception:
    traceback.print_exc()

try:
    # Fails in __del__ due to lack of memfree_args
    del d2
except Exception:
    traceback.print_exc()

demonstrates the issue.

<snip>
AttributeError: 'Data' object has no attribute '_index_stash'
<snip>
AttributeError: 'Data' object has no attribute '_memfree_args'
<snip>

tjb900 avatar Mar 24 '20 07:03 tjb900

I think there might be a case to be made that not calling __array_finalize__ at unpickling time might be considered a numpy bug, since this document https://docs.scipy.org/doc/numpy/user/basics.subclassing.html seems to claim that __array_finalize__ covers all methods of instance creation:

Because __array_finalize__ is the only method that always sees new instances being created, it is the sensible place to fill in instance defaults for new object attributes, among other tasks.

tjb900 avatar Mar 24 '20 07:03 tjb900

Data objects are not designed to be 'stripped' away from their Function in this manner. Is there any reason why:

from devito import Grid, Function
import pickle
g = Grid((20, 20))
u = Function(name='u', grid=g)
u2 = pickle.loads(pickle.dumps(u))
try:
    # Will fail in array_finalize due to lack of _index_stash
    print(u2.data)
except Exception:
    traceback.print_exc()
try:
    # Fails in __del__ due to lack of memfree_args
    del u2
except Exception:
    traceback.print_exc()

would not achieve what you want?

rhodrin avatar Mar 24 '20 09:03 rhodrin

Well, that might be true, but it's relatively easy to end up with standalone Data objects. e.g. this returns a Data:

def f():
    # create devito stuff
    # run operator
    output_i_want = u.data[:] * 2
    return output_i_want

And when these resulting objects are used in dask/distributed, they are pickled and then it makes for sad times.

Now, it would work if everyone knew to cast/copy everything to ndarray. But I think moving a couple of things around in data.py should make it a non-issue anyway?

tjb900 avatar Mar 25 '20 02:03 tjb900

Yeah, I see what you mean. So creating a 'new' Data object clearly works, but I'll have a think re. the best way to get 'that' attribute behaviour in data.py (probably through implementing our own __reduce__ and __setstate__).

from devito import Grid, Function
from devito.data import Data
import pickle

g = Grid((20, 20))
u = Function(name='u', grid=g)

u.data[:] = 1.0

d1 = u.data

d2 = pickle.loads(pickle.dumps(d1))

d3 = Data(d2.shape, d2.dtype.type,
          decomposition=None,
          modulo=(False,)*len(d2.shape))

d3.data = d2.data

d2._memfree_args = None
del d2

print(d3)
del d3

rhodrin avatar Mar 25 '20 12:03 rhodrin

For my reference: seems like a solution could be in here https://stackoverflow.com/questions/26598109/preserve-custom-attributes-when-pickling-subclass-of-numpy-array

rhodrin avatar Mar 25 '20 13:03 rhodrin

So the changes required to properly pickle our Data type seem non-trivial - for a start mpi4py.MPI.Comm objects can't be pickled.

Therefore I'd suggest casting to numpy arrays prior to pickling as you mention above.

~Will leave this open for a days or to allow anyone to post comments/suggestions before closing.~

rhodrin avatar Mar 31 '20 12:03 rhodrin

After a quick discussion, we'll keep this open but it's not clear to us (aside from for user convenience) in what situations this functionality is particularly important? (Pickling Data may even have some unintended negative side-effects).

We'll revisit this if shown it's particularly needed.

rhodrin avatar Mar 31 '20 16:03 rhodrin

Sorry, just getting back to this.

It was the simplest way to make a reproducer, but I'm not 100% sure that pickling is the only source of the issue. At least according to that numpy documentation quoted above (repeated here):

Because __array_finalize__ is the only method that always sees new instances being created, it is the sensible place to fill in instance defaults for new object attributes, among other tasks.

I read this to mean that you cannot rely even on __new__ being called, which means the self._memfree_args reference in __del__ causes an error. This is a simple case of setting _memfree_args to None in __array_finalize__.

On the pickling front - I see your argument. However, in that case I think we should go further to be able to give a useful error message to the end-user. i.e. overriding __reduce__ or one of its friends to raise an Exception, rather than having a path to creating an object that's in a half-initialised state?

tjb900 avatar May 25 '20 08:05 tjb900

I'm hitting the same issue. It's really easy to trip on this when working with Dask so I would push for increasing the priority of this.

navjotk avatar Jul 01 '20 13:07 navjotk

@tjb900

but I'm not 100% sure that pickling is the only source of the issue.

Indeed, that a good point - I'll take a careful look.

On the pickling front - I see your argument. However, in that case I think we should go further to be able to give a useful error message to the end-user. i.e. overriding reduce or one of its friends to raise an Exception, rather than having a path to creating an object that's in a half-initialised state?

Yes, if it's not currently supported we should be throwing an error via, e.g., overriding __reduce__.

@navjotk Do you have a reproducer that doesn't involve pickling?

rhodrin avatar Jul 01 '20 16:07 rhodrin

My scenario involves Dask which implicitly pickles all function arguments and return values so no. Maybe we should be raising an exception as a reminder to pick up on this "mistake"?

I knew Data objects wouldn't behave well with Dask but still ended up with a function like this, basically because I didn't realise my result was a Data object.

Only thing that made me realise why this happened was me posting the error message on slack and @FabioLuporini pointing me here - and then I was facepalming.

navjotk avatar Jul 01 '20 16:07 navjotk