CNN_sentence icon indicating copy to clipboard operation
CNN_sentence copied to clipboard

Save trained model

Open Huarong opened this issue 9 years ago • 24 comments

Hi, can you add the method of saving trained model for future prediction ?

Huarong avatar Jul 31 '15 06:07 Huarong

Same question.Did you find a way?

chaseleecn avatar Nov 10 '15 03:11 chaseleecn

Once you are happy with your trained network you can use python pickle module to serialize the object in in a file using the method dump and for loading use the method load. python pickle : https://docs.python.org/2/library/pickle.html

hadyelsahar avatar Nov 10 '15 11:11 hadyelsahar

In this code, which object should I serialize? I've tried to use the method dump to serialize 'classifier',but it didn't work. PicklingError: Can't pickle <type 'instancemethod'>

chaseleecn avatar Nov 10 '15 13:11 chaseleecn

@guanxingke you should serialize the object named "params" which stored the parameter in the model. When you predict a new document classification, you should reconstruct the neutral network and set the parameter as what you have saved

MarkWuNLP avatar Nov 12 '15 07:11 MarkWuNLP

@guanxingke you can use what @MarkWuNLP has mentioned above. Instead I reconstructed the code to a class having two methods.

def save(self, path):
    with open(path, 'wb') as f:
        pickle.dump(self, f, -1)
    logger.info('save model to path %s' % path)
    return None

@classmethod
def load(self, path):
    with open(path, 'rb') as f:
        return pickle.load(f)

Huarong avatar Nov 13 '15 02:11 Huarong

@MarkWuNLP Thanks for your answer. I still have some questions. There are two params,①params in function train_conv_net,②classifier.params. I assume that you mean the first one.When should I serialize the params? And when should I load the params?

this is my code: params = classifier.params
for conv_layer in conv_layers: params += conv_layer.params if non_static: #if word vectors are allowed to change, add them as model parameters params += [Words]

# f = open("params.save", "wb")
# cPickle.dump(params, f)
# f.close()
# print 'Params saved.'

print "Loading params..."
fr = open("params.save","rb")
params = cPickle.load(fr)
fr.close()

I save the params first, and load it next time. It doesn't work:

Loading params... Traceback (most recent call last): File "D:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.4\helpers\pydev\pydevd.py", line 2358, in globals = debugger.run(setup['file'], None, None, is_module) File "D:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.4\helpers\pydev\pydevd.py", line 1778, in run pydev_imports.execfile(file, globals, locals) # execute the script File "F:/PythonWorkspace/CNN_sentence-master/conv_net_sentence.py", line 339, in dropout_rate=[0.5]) File "F:/PythonWorkspace/CNN_sentence-master/conv_net_sentence.py", line 115, in train_conv_net grad_updates = sgd_updates_adadelta(params, dropout_cost, lr_decay, 1e-6, sqr_norm_lim) File "F:/PythonWorkspace/CNN_sentence-master/conv_net_sentence.py", line 233, in sgd_updates_adadelta gp = T.grad(cost, param) File "C:\Python27\lib\site-packages\theano-0.7.0-py2.7.egg\theano\gradient.py", line 529, in grad handle_disconnected(elem) File "C:\Python27\lib\site-packages\theano-0.7.0-py2.7.egg\theano\gradient.py", line 516, in handle_disconnected raise DisconnectedInputError(message) theano.gradient.DisconnectedInputError: grad method was asked to compute the gradient with respect to a variable that is not part of the computational graph of the cost, or is used only by a non-differentiable operator: W

If I load the params before the "while (epoch < n_epochs):" loop, I don't see how the params could effect my result.

chaseleecn avatar Nov 13 '15 03:11 chaseleecn

@Huarong Thanks for your answer. I assume that you mean the class is "class MLPDropout". And I save the class like this :

classifier.save('classifier.save')

It's the same error I mention before:

Traceback (most recent call last): File "D:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.4\helpers\pydev\pydevd.py", line 2358, in globals = debugger.run(setup['file'], None, None, is_module) File "D:\Program Files (x86)\JetBrains\PyCharm Community Edition 4.5.4\helpers\pydev\pydevd.py", line 1778, in run pydev_imports.execfile(file, globals, locals) # execute the script File "F:/PythonWorkspace/CNN_sentence-master/conv_net_sentence.py", line 342, in dropout_rate=[0.5]) File "F:/PythonWorkspace/CNN_sentence-master/conv_net_sentence.py", line 170, in train_conv_net classifier.save('classifier.save') File "conv_net_classes.py", line 183, in save pickle.dump(self, f, -1) File "C:\Python27\lib\pickle.py", line 1370, in dump Pickler(file, protocol).dump(obj) File "C:\Python27\lib\pickle.py", line 224, in dump self.save(obj) File "C:\Python27\lib\pickle.py", line 331, in save self.save_reduce(obj=obj, *rv) File "C:\Python27\lib\pickle.py", line 419, in save_reduce save(state) File "C:\Python27\lib\pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "C:\Python27\lib\pickle.py", line 649, in save_dict self._batch_setitems(obj.iteritems()) File "C:\Python27\lib\pickle.py", line 681, in _batch_setitems save(v) File "C:\Python27\lib\pickle.py", line 331, in save self.save_reduce(obj=obj, *rv) File "C:\Python27\lib\pickle.py", line 396, in save_reduce save(cls) File "C:\Python27\lib\pickle.py", line 286, in save f(self, obj) # Call unbound method with explicit self File "C:\Python27\lib\pickle.py", line 748, in save_global (obj, module, name)) pickle.PicklingError: Can't pickle <type 'instancemethod'>: it's not found as builtin.instancemethod

chaseleecn avatar Nov 13 '15 03:11 chaseleecn

@guanxingke No, I mean both cnn parameters and mlp features. I used CPickle to stored the params successfully.

savefile = file('obj.save', 'wb') cPickle.dump(params,savefile,protocol=cPickle.HIGHEST_PROTOCOL)

Furthermore, when you use the stored feature. conv_layer.W_conv and conv_layer.b_conv should be reset as well as self.dropout_layers.W self.dropout_layers.b

MarkWuNLP avatar Nov 13 '15 03:11 MarkWuNLP

@MarkWuNLP I stored the params successfully too, but how do I use it to predict what I write? Could you show some examples? Like when and how I load the params, and make some further predict. Thanks again.

chaseleecn avatar Nov 17 '15 08:11 chaseleecn

@ChasonLee I have the same problem as you. Did you find an answer to your last question ? Any example ? Thanks

huydan avatar Mar 18 '16 15:03 huydan

@huydan , @ChasonLee Did you guys find any example ?_?

Muugii-bs avatar Jun 16 '16 13:06 Muugii-bs

@ChasonLee , @huydan , @Muugii-bs Hey guys, I have made some modifications in the code so that further predictions can be made on some test examples. You can found it here: https://github.com/DeepanwayGhosal/CNN_sentence Let me know if this works.

deepanwayx avatar Jun 18 '16 20:06 deepanwayx

Thank you @DeepanwayGhosal 👍 I have a question. Where in the code, do you load the saved parameters ?

Muugii-bs avatar Jun 20 '16 07:06 Muugii-bs

@Muugii-bs I don't actually save the parameters. Along with train and validation set I also pass the test set in the function train_conv_net() and it returns predicted test labels.

In the train_conv_net() function you can find a code snippet between these two comment lines which predicts the test labels.

# So we make prediction only by taking a maximum of 2000 test examples at a time ...... ...... ....... #start training over mini-batches

@huydan mentioned in the his last comment that he wants a prediction example. So I have only done that here. I will try to make a model which saves and loads the parameters and can make predictions even without passing the test set in the train_conv_net() function.

deepanwayx avatar Jun 20 '16 07:06 deepanwayx

hi,I think I know how to save and load the model. you can save the params of the model. and load it by set_value() function. example save: with open('./modelfile', 'wb') as f: cPickle.dump(classifier.params, f, -1)

load: with open('./model/classifier4class_all20160728.params', 'rb') as f: tmp = cPickle.load(f) for i in range(len(classifier.params)): classifier.params[i].set_value(tmp[i].get_value())

of course, before load the classifier, you should define it, just like the train process.

521088684 avatar Jul 29 '16 01:07 521088684

@deepanwayx @521088684 can you please explain more about how the code should change, if we want to save the trained model and load it in the test time to predict the sentiment for one instance only?

Monireh2 avatar Oct 31 '16 23:10 Monireh2

@ChasonLee did you find any example on how you should save and then predict based on your saved model? Can you please share those examples?

Monireh2 avatar Oct 31 '16 23:10 Monireh2

@521088684 Can you please specify where exactly I should dump and where I should load the model? I have dumped it at the end of this if: if val_perf >= best_val_perf:

and loaded it after

classifier = MLPDropout(rng, input=layer1_input, layer_sizes=hidden_units, activations=activations, dropout_rates=dropout_rate)

#define parameters of the model and update functions using adadelta
params = classifier.params     
for conv_layer in conv_layers:
    params += conv_layer.params
if non_static:
    #if word vectors are allowed to change, add them as model parameters
    params += [Words]

and off course removed the training part and tested it on my test set.

But the problem is that when I am applying it on my test data most of them would be assigned to the negative class, that is not interpretable based on the accuracy that I have got.

Monireh2 avatar Nov 04 '16 02:11 Monireh2

is there someone who has already successfully implemented how to save and load the model?? how to save the trained model and load it in the test time to predict the sentiment for one instance only?

Denybarros avatar Jan 08 '17 13:01 Denybarros

Same request. If someone has worked this out successfully, it would be good to get some insight.

abishekh avatar Jan 29 '17 22:01 abishekh

Hi,

for me the suggestion from @521088684 works. What I actually did was, that I added to the MLPDropout-class the following method: `

def save(self, path):
    with open(path, 'wb') as f:
        pickle.dump(self.params, f, -1)
    return None

Then in the train_conv_net()-method you can insertclassifier.save("/.../")` anywhere after the initialization (ideally after checking if the current model is the best one).

Then I added to the conv_net_sentences.py file a new method where I actually copied the train_conv_net()-method with a few changes. the first part of this method looks like ` rng = np.random.RandomState(3435) img_h = len(datasets[0])-1 filter_w = img_w feature_maps = hidden_units[0] filter_shapes = [] pool_sizes = [] for filter_h in filter_hs: filter_shapes.append((feature_maps, 1, filter_h, filter_w)) pool_sizes.append((img_h-filter_h+1, img_w-filter_w+1))

'''
define model architecture
'''
index = T.lscalar()
x = T.matrix('x')
y = T.ivector('y')
Words = theano.shared(value=U, name="Words")
zero_vec_tensor = T.vector()
zero_vec = np.zeros(img_w)
set_zero = theano.function([zero_vec_tensor], updates=[(Words, T.set_subtensor(Words[0, :], zero_vec_tensor))],
                           allow_input_downcast=True)
layer0_input = Words[T.cast(x.flatten(), dtype="int32")].reshape((x.shape[0], 1, x.shape[1], Words.shape[1]))
conv_layers = []
layer1_inputs = []
for i in xrange(len(filter_hs)):
    filter_shape = filter_shapes[i]
    pool_size = pool_sizes[i]
    conv_layer = LeNetConvPoolLayer(rng, input=layer0_input, image_shape=(batch_size, 1, img_h, img_w),
                                    filter_shape=filter_shape, poolsize=pool_size, non_linear=conv_non_linear)
    layer1_input = conv_layer.output.flatten(2)
    conv_layers.append(conv_layer)
    layer1_inputs.append(layer1_input)
layer1_input = T.concatenate(layer1_inputs, 1)
hidden_units[0] = feature_maps*len(filter_hs)
classifier = MLPDropout(rng, input=layer1_input, layer_sizes=hidden_units, activations=activations,
                        dropout_rates=dropout_rate)

'''
define parameters of the model and update functions using adadelta
'''
params = classifier.params
for conv_layer in conv_layers:
    params += conv_layer.params
if non_static:
    # if word vectors are allowed to change, add them as model parameters
    params += [Words]
cost = classifier.negative_log_likelihood(y)
dropout_cost = classifier.dropout_negative_log_likelihood(y)
grad_updates = sgd_updates_adadelta(params, dropout_cost, lr_decay, 1e-6, sqr_norm_lim)

'''
load model parameters
'''
with open(params_file, 'rb') as f:
    tmp = cPickle.load(f)

for i in range(len(classifier.params)):
    classifier.params[i].set_value(tmp[i].get_value())

del tmp

'''
prepare datasets
2) split x and y axis
'''
np.random.seed(3435)

test_set_x = datasets[:, :img_h]
test_set_y = np.asarray(datasets[:, -1], "int32")

'''
models
'''
test_pred_layers = []
test_size = batch_size              # modified line

test_layer0_input = Words[T.cast(x.flatten(), dtype="int32")].reshape((test_size, 1, img_h, Words.shape[1]))
for conv_layer in conv_layers:
    test_layer0_output = conv_layer.predict(test_layer0_input, test_size)
    test_pred_layers.append(test_layer0_output.flatten(2))

test_layer1_input = T.concatenate(test_pred_layers, 1)
test_y_pred = classifier.predict(test_layer1_input)
test_y_pred_p = classifier.predict_p(test_layer1_input)
test_y_pred_p_reduce = test_y_pred_p[:, 0]
test_error = T.mean(T.neq(test_y_pred, y))
test_model_all = theano.function([x, y], test_error, allow_input_downcast=True)
test_predict = theano.function([x], test_y_pred, allow_input_downcast=True)
test_probs = theano.function([x], test_y_pred_p_reduce, allow_input_downcast=True)

` Afterwards you only have to split into batches and perform the test_-functions :-)

pexmar avatar May 27 '17 21:05 pexmar

@pexmar could you give examples of how to use these test_-functions? I have no idea about their functions. Thank you so much

Zero0one1 avatar Apr 28 '20 15:04 Zero0one1

@Zero0one1 Did you succeed to use these functions ?

moses9591 avatar Sep 22 '20 13:09 moses9591

@Zero0one1 Did you succeed to use these functions ?

No. I saved the model successfully but don't know how to predict new input. Do you have any idea?

Zero0one1 avatar Oct 18 '20 09:10 Zero0one1