dispy icon indicating copy to clipboard operation
dispy copied to clipboard

Support for Dill or Cloudpickle?

Open altaetran opened this issue 6 years ago • 2 comments

Hi,

I'm looking to submit scripts that contain lambda functions to a compute cluster, but this seems impossible with the current setup due to the use of pickle without dill or cloudpickle. Do you have suggestions for how I might be able to submit these lambda functions? It seems that I cannot pickle them with dill and then send the pickled data over. Thanks!

altaetran avatar Apr 27 '18 05:04 altaetran

With 'depends' or 'dispy_job_depends' you can pass any data after serializing it and deserializing it in the compute function on node. It seems cloudpickle is compatible with pickle so you can use it to serialize it on the client and that should just work (I think).

I was not aware of cloudpickle; may be it would be useful to more users to support it. If you want to use it right now (so you don't have to serialize it as described above), you can change serialize and deserialize functions in pycos/__init__.py file to use cloudpickle. No other changes are needed.

pgiri avatar Apr 27 '18 11:04 pgiri

Looking into cloudpickle further (e.g., https://github.com/RaRe-Technologies/gensim/issues/558), cloudpickle (or dill) may not be a good idea in general due to performance issues. So instead of changing pycos file, you can change serialize function in your dispy client program with:

if __name__ == '__main__':
    import dispy, cloudpickle
    def serialize(obj):
        cloudpickle.dumps(obj)
    dispy.serialize = serialize
    

I haven't tested it this way, but since cloudpickle is drop in replacement for pickle, it may work.

pgiri avatar Apr 27 '18 12:04 pgiri