nolearn
nolearn copied to clipboard
Training with GPU, inference on CPU with a pickled model
Training with Cuda on a GPU machine, it is not possible to load the model on a machine without GPU in a straightforward fashion because some theano parameters are Cuda Arrays. My reading of this is that this is a theano thing and that there are only workarounds. One way is to
save_params_toon the GPU machine- initialize an identical NeuralNet on the CPU machine
load_params_fromon that machine.
However, this requires quite some extra effort in some situations, e.g. if the NeuralNet is part of an sklearn Pipeline. You have to
- save the Pipeline's steps except for the one containing the NeuralNet on the GPU machine
- save the latter's parameters separately
- load the Pipeline on the CPU machine
- add a fresh NeuralNet to the right place in the Pipeline
- load the parameters to that net.
I wonder if there is a better way. Maybe it is possible to add a method to NeuralNet that converts Cuda Arrays to normal theano shared variables? Does someone know of a better approach?
You're probably seeing the effect of this change.
Can you try setting config.reoptimize_unpickled_function to True?
I was referring to the same problem as mentioned here. The option you mentioned would not solve this, or would it (can't test it right now)?
That thread is a little confusing, with multiple issues. Can you paste the traceback that you're getting?
I get the same Cuda not found. Cannot unpickle CudaNdarray error. Shared variables are saved in Cuda Arrays that you can't load on a machine without Cuda. One suggestion would be a method that somehow implements the solution proposed in the thread. I'm not sure though whether this would cover all use cases and whether there might not be a better solution.
I have a solution that seems to work:
class PortableNeuralNet(NeuralNet):
def __setstate__(self, state): # BBB for pickles that don't have the graph
with tempfile.TemporaryDirectory() as tmpdirname:
filename = os.path.join(tmpdirname, 'tmp_weights.pkl')
with open(filename, 'wb') as f:
pickle.dump(state['_params_temp_save'], f, -1)
del state['_params_temp_save']
self.__dict__.update(state)
self.initialize()
self.load_params_from(filename)
def __getstate__(self):
state = dict(self.__dict__)
params = self.get_all_params_values()
for key in list(state.keys()): # to avoid RuntimeError
if key == 'train_history_':
continue
if key.endswith('_'):
del state[key]
del state['_output_layer']
del state['_initialized']
state['_params_temp_save'] = params
return state
I don't know whether this is worth integrating or not. Instead of a new class, it could be a switch in the NeuralNet class. What do you think, Daniel?
@BenjaminBossan Do you mind describing the difference between this and the implementation that was removed in #228?
I looked at the problem a little bit more, and understand the issue now better. The code that was removed in #228 had the same issue since it did not do anything with the layer instances (layers_) in __getstate__.
So then I tried to come up with my own variation of the code that you proposed:
class YetAnotherPortableNeuralNet(NeuralNet):
def __setstate__(self, state):
params = state.pop('__params__', None)
self.__dict__.update(state)
self.initialize()
if params is not None:
self.load_params_from(params)
def __getstate__(self):
state = dict(self.__dict__)
if self._initialized:
params = self.get_all_params_values()
else:
params = None
for attr in (
'train_iter_',
'eval_iter_',
'predict_iter_',
'_initialized',
'_get_output_fn_cache',
'_output_layer',
'layers_',
'layers',
):
if attr in state:
del state[attr]
state['__params__'] = params
return state
Because I thought your proposal was good, just needed a bit of refactoring (to remove writing out the file, and I also wanted to be more explicit about attributes removed on the way out).
But then I found out that this approach has its own problems. Namely, if self.layers is already a list of layer instances, it won't work, since those instances will then contain the cuda arrays which will then be pickled. Deleting those instances on the way out also doesn't work for obvious reasons.
I'm thinking that as long as we can't fix this in the general case, we shouldn't put code like this into nolearn.lasagne itself. But we can point people to solutions that might work for them. One such solution might be the this script inside of pylearn2 which I'm about to try out.
But then I found out that this approach has its own problems. Namely, if self.layers is already a list of layer instances
Right, I did not think about that possibility. We could raise an error in that case but it is not a satisfying solution.
One such solution might be this script inside of pylearn2 which I'm about to try out.
I believe @alattner tried that to no avail.
I'm thinking that as long as we can't fix this in the general case, we shouldn't put code like this into nolearn.lasagne itself.
I agree but it would be nice to be able to somehow use this kludge by checking out a specific nolearn branch or something.
I believe @alattner tried that to no avail.
No, I haven't tried that script inside pylearn2. I tried the config.experimental.unpickle_gpu_on_cpu option with no success.
I tried the script and it failed with some weird recursion error.
sys.setrecursionlimit(10 ** 999)
OK just let me know if that's a joke or if it actually works. ;-)
I would not try it :)
Anyway, do you see a working solution for this?
I'll take another look next week. So far didn't have much luck.
So much for not breaking code in this PR :)
For those who use this snippet, in the part shown below, change '_output_layer' to '_output_layers':
for attr in (
'train_iter_',
'eval_iter_',
'predict_iter_',
'_initialized',
'_get_output_fn_cache',
'_output_layer',
'layers_',
'layers',
):
if attr in state:
del state[attr]
Is there any update on how to train on GPU, save, and load on CPU for inference?
@kungfujam Note that as per the original post, you can always do this:
save_params_toon the GPU machine- initialize an identical NeuralNet on the CPU machine
load_params_fromon that machine.
This issue is about not being able to use a 'pickled' network trained on a GPU and use it in a CPU environment, which is sometimes more convenient.
Thanks for clarification. Am using that method currently. Not a huge deal but would be nice to pickle.