nolearn Training with GPU, inference on CPU with a pickled model

Training with Cuda on a GPU machine, it is not possible to load the model on a machine without GPU in a straightforward fashion because some theano parameters are Cuda Arrays. My reading of this is that this is a theano thing and that there are only workarounds. One way is to

save_params_to on the GPU machine
initialize an identical NeuralNet on the CPU machine
load_params_from on that machine.

However, this requires quite some extra effort in some situations, e.g. if the NeuralNet is part of an sklearn Pipeline. You have to

save the Pipeline's steps except for the one containing the NeuralNet on the GPU machine
save the latter's parameters separately
load the Pipeline on the CPU machine
add a fresh NeuralNet to the right place in the Pipeline
load the parameters to that net.

I wonder if there is a better way. Maybe it is possible to add a method to NeuralNet that converts Cuda Arrays to normal theano shared variables? Does someone know of a better approach?

Apr 01 '16 16:04 BenjaminBossan

You're probably seeing the effect of this change.

Can you try setting config.reoptimize_unpickled_function to True?

Apr 01 '16 16:04 dnouri

I was referring to the same problem as mentioned here. The option you mentioned would not solve this, or would it (can't test it right now)?

Apr 01 '16 18:04 BenjaminBossan

That thread is a little confusing, with multiple issues. Can you paste the traceback that you're getting?

Apr 01 '16 21:04 dnouri

I get the same Cuda not found. Cannot unpickle CudaNdarray error. Shared variables are saved in Cuda Arrays that you can't load on a machine without Cuda. One suggestion would be a method that somehow implements the solution proposed in the thread. I'm not sure though whether this would cover all use cases and whether there might not be a better solution.

Apr 02 '16 08:04 BenjaminBossan

I have a solution that seems to work:

class PortableNeuralNet(NeuralNet):
    def __setstate__(self, state):  # BBB for pickles that don't have the graph
        with tempfile.TemporaryDirectory() as tmpdirname:
            filename = os.path.join(tmpdirname, 'tmp_weights.pkl')
            with open(filename, 'wb') as f:
                pickle.dump(state['_params_temp_save'], f, -1)

            del state['_params_temp_save']
            self.__dict__.update(state)

            self.initialize()
            self.load_params_from(filename)

    def __getstate__(self):
        state = dict(self.__dict__)
        params = self.get_all_params_values()

        for key in list(state.keys()):  # to avoid RuntimeError
            if key == 'train_history_':
                continue
            if key.endswith('_'):
                del state[key]
        del state['_output_layer']
        del state['_initialized']
        state['_params_temp_save'] = params
        return state

I don't know whether this is worth integrating or not. Instead of a new class, it could be a switch in the NeuralNet class. What do you think, Daniel?

Apr 04 '16 09:04 BenjaminBossan

@BenjaminBossan Do you mind describing the difference between this and the implementation that was removed in #228?

Apr 06 '16 14:04 dnouri

I looked at the problem a little bit more, and understand the issue now better. The code that was removed in #228 had the same issue since it did not do anything with the layer instances (layers_) in __getstate__.

So then I tried to come up with my own variation of the code that you proposed:

class YetAnotherPortableNeuralNet(NeuralNet):
    def __setstate__(self, state):
        params = state.pop('__params__', None)
        self.__dict__.update(state)
        self.initialize()
        if params is not None:
            self.load_params_from(params)

    def __getstate__(self):
        state = dict(self.__dict__)
        if self._initialized:
            params = self.get_all_params_values()
        else:
            params = None

        for attr in (
            'train_iter_',
            'eval_iter_',
            'predict_iter_',
            '_initialized',
            '_get_output_fn_cache',
            '_output_layer',
            'layers_',
            'layers',
                ):
            if attr in state:
                del state[attr]
        state['__params__'] = params
        return state

Because I thought your proposal was good, just needed a bit of refactoring (to remove writing out the file, and I also wanted to be more explicit about attributes removed on the way out).

But then I found out that this approach has its own problems. Namely, if self.layers is already a list of layer instances, it won't work, since those instances will then contain the cuda arrays which will then be pickled. Deleting those instances on the way out also doesn't work for obvious reasons.

I'm thinking that as long as we can't fix this in the general case, we shouldn't put code like this into nolearn.lasagne itself. But we can point people to solutions that might work for them. One such solution might be the this script inside of pylearn2 which I'm about to try out.

Apr 06 '16 16:04 dnouri

But then I found out that this approach has its own problems. Namely, if self.layers is already a list of layer instances

Right, I did not think about that possibility. We could raise an error in that case but it is not a satisfying solution.

One such solution might be this script inside of pylearn2 which I'm about to try out.

I believe @alattner tried that to no avail.

I'm thinking that as long as we can't fix this in the general case, we shouldn't put code like this into nolearn.lasagne itself.

I agree but it would be nice to be able to somehow use this kludge by checking out a specific nolearn branch or something.

Apr 06 '16 16:04 BenjaminBossan

I believe @alattner tried that to no avail.

No, I haven't tried that script inside pylearn2. I tried the config.experimental.unpickle_gpu_on_cpu option with no success.

Apr 06 '16 18:04 alattner

I tried the script and it failed with some weird recursion error.

Apr 06 '16 19:04 dnouri

sys.setrecursionlimit(10 ** 999)

Apr 06 '16 20:04 BenjaminBossan

OK just let me know if that's a joke or if it actually works. ;-)

Apr 06 '16 20:04 dnouri

I would not try it :)

Anyway, do you see a working solution for this?

Apr 07 '16 16:04 BenjaminBossan

I'll take another look next week. So far didn't have much luck.

Apr 07 '16 18:04 dnouri

So much for not breaking code in this PR :)

For those who use this snippet, in the part shown below, change '_output_layer' to '_output_layers':

        for attr in (
            'train_iter_',
            'eval_iter_',
            'predict_iter_',
            '_initialized',
            '_get_output_fn_cache',
            '_output_layer',
            'layers_',
            'layers',
                ):
            if attr in state:
                del state[attr]

Sep 26 '16 16:09 BenjaminBossan

Is there any update on how to train on GPU, save, and load on CPU for inference?

Jan 25 '17 14:01 JamesOwers

@kungfujam Note that as per the original post, you can always do this:

save_params_to on the GPU machine
initialize an identical NeuralNet on the CPU machine
load_params_from on that machine.

This issue is about not being able to use a 'pickled' network trained on a GPU and use it in a CPU environment, which is sometimes more convenient.

Jan 25 '17 14:01 dnouri

Thanks for clarification. Am using that method currently. Not a huge deal but would be nice to pickle.

Jan 25 '17 15:01 JamesOwers

nolearn nolearn copied to clipboard

Training with GPU, inference on CPU with a pickled model

nolearn
nolearn copied to clipboard