dispy
dispy copied to clipboard
Can't pickle
The use of Python's multiprocessing module (for example, in dispynode.py) prohibits the use of data objects that cannot be pickled leading to the error "Can't pickle <class 'module'>: attribute lookup module on builtins failed".
Does dispy manage this in some way or not as there are data objects that are not pickable?
The data is sent from client to nodes over network, so data has to be serialized (and not because of multiprocessing). As mentioned in documentation, if objects can't be serialized automatically, the classes whose instance are sent should provide __getstate__ and __setstate__ methods to serialize and deserialize. See, for example, _DispyJob_ class in dispy's 'init.py'
@pgiri As a newbie to dispy, I want to initially run n_worker processes on a single machine on n cores (in my case between 32 and 64). The preferred option is to load the static data object (a keras data model which is not pickable) into each process and then feed all the processes an iterable of data. Memory is not a problem.
An alternative option is for the n processes to share the data object in shared memory but that will likely cause problems and not be performant.
If the data model is loaded simply as, for example:
model = keras.models.Model(inputs=base_model.input, outputs= base_model.get_layer('block4_pool').output)
then what does a Class to manage getstate and setstate look like to support serialization for dispy?
If you are sending an object with submit method of cluster, then that object must be serializable. Python's pickle can serialize data in most cases, but if objects have attributes such as file pointers, locks etc., the class definition for such objects must serialize by, for example, excluding such attributes. It is common to define __getstate__ to return a dictionary and for __setstate__ to use setattr to set attributes from that dictionary, as done in _DispyJob_ mentioned above. I haven't used keras to offer specific instructions about it.