seq2seq
seq2seq copied to clipboard
When trying to use model.save_weights,get the h5py name already exist error
The model I used is a simple seq2seq attention model, When I try to save the model weight after 5 epoch,
I get the following error, It is likely the h5py create dataset with a existing name, Why this thing happen? Thanks in advance.
@farizrahman4u
model = AttentionSeq2Seq(input_dim=4096, input_length=5, hidden_dim=4096, output_length=5, output_dim=300, depth=(1,1))
print "Number of stage", i
if i >num:
#model.fit(Sentenceseq, Imageseq, batch_size=100, nb_epoch=5,validation_data=(val_Sentenceseq,val_Imageseq),shuffle=True)
model.fit(Imageseq,Sentenceseq, batch_size=100, nb_epoch=5, validation_data=(val_Imageseq,val_Sentenceseq),
shuffle=True)
print "Checkpoint saved"
model.save_weights('./model_seq2seqAttention/rcn_'+str(i)+'.hdf5')```
Train on 38064 samples, validate on 4798 samples
Epoch 1/5
38064/38064 [==============================] - 2555s - loss: 7042.3301 - acc: 0.0050 - val_loss: 9680.7603 - val_acc: 0.0072
Epoch 2/5
38064/38064 [==============================] - 2561s - loss: 5921.5143 - acc: 0.0042 - val_loss: 9544.5769 - val_acc: 0.0042
Epoch 3/5
38064/38064 [==============================] - 2573s - loss: 5748.3671 - acc: 0.0043 - val_loss: 9588.1486 - val_acc: 0.0035
Epoch 4/5
38064/38064 [==============================] - 2570s - loss: 5782.3526 - acc: 0.0063 - val_loss: 9661.4667 - val_acc: 0.0038
Epoch 5/5
38064/38064 [==============================] - 2568s - loss: 5550.6135 - acc: 0.0041 - val_loss: 9390.9714 - val_acc: 0.0045
Checkpoint saved
`RuntimeError Traceback (most recent call last)
<ipython-input-1-85bf116cee1d> in <module>()
107 shuffle=True)
108 print "Checkpoint saved"
--> 109 model.save_weights('./model_seq2seqAttention/rcn_'+str(i)+'.hdf5')
/usr/local/lib/python2.7/dist-packages/keras/engine/topology.pyc in save_weights(self, filepath, overwrite)
2446 return
2447 f = h5py.File(filepath, 'w')
-> 2448 self.save_weights_to_hdf5_group(f)
2449 f.flush()
2450 f.close()
/usr/local/lib/python2.7/dist-packages/keras/engine/topology.pyc in save_weights_to_hdf5_group(self, f)
2473 for name, val in zip(weight_names, weight_values):
2474 param_dset = g.create_dataset(name, val.shape,
-> 2475 dtype=val.dtype)
2476 if not val.shape:
2477 # scalar
/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.pyc in create_dataset(self, name, shape, dtype, data, **kwds)
104 dset = dataset.Dataset(dsid)
105 if name is not None:
--> 106 self[name] = dset
107 return dset
108
h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2574)()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2533)()
/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.pyc in __setitem__(self, name, obj)
266
267 if isinstance(obj, HLObject):
--> 268 h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
269
270 elif isinstance(obj, SoftLink):
h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2574)()
h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2533)()
h5py/h5o.pyx in h5py.h5o.link (/tmp/pip-build-zgFvsS/h5py/h5py/h5o.c:3713)()
RuntimeError: Unable to create link (Name already exists)`
Same problem here: https://github.com/farizrahman4u/seq2seq/issues/125 So far I've just found out that the error only arises if your AttentionSeq2Seq is bidirectional.
I just print the name that h5py create(line 2473 of keras/engine/topology.py) ,when it is bidirectional=True, I get the following,there is surely some repeated name ,such as two lstmcell_1_W:
['lstmcell_1_W', 'lstmcell_1_U', 'lstmcell_1_b', 'lstmcell_2_W', 'lstmcell_2_U', 'lstmcell_2_b', 'lstmcell_3_W', 'lstmcell_1_W', 'lstmcell_1_U', 'lstmcell_1_b', 'lstmcell_2_W', 'lstmcell_2_U', 'lstmcell_2_b', 'lstmcell_3_W']
instead when bidirectional=False, I get the following:
['lstmcell_1_W', 'lstmcell_1_U', 'lstmcell_1_b', 'lstmcell_2_W', 'lstmcell_2_U', 'lstmcell_2_b', 'lstmcell_3_W', 'lstmcell_3_U', 'lstmcell_3_b', 'lstmcell_4_W', 'lstmcell_4_U', 'lstmcell_4_b'] ['attentiondecodercell_1_W1', 'attentiondecodercell_1_W2', 'attentiondecodercell_1_W3', 'attentiondecodercell_1_U', 'attentiondecodercell_1_b1', 'attentiondecodercell_1_b2', 'attentiondecodercell_1_b3', 'lstmdecodercell_1_W1', 'lstmdecodercell_1_W2', 'lstmdecodercell_1_U', 'lstmdecodercell_1_b1', 'lstmdecodercell_1_b2', 'lstmdecodercell_2_W1', 'lstmdecodercell_2_W2', 'lstmdecodercell_2_U', 'lstmdecodercell_2_b1', 'lstmdecodercell_2_b2', 'lstmdecodercell_3_W1', 'lstmdecodercell_3_W2', 'lstmdecodercell_3_U', 'lstmdecodercell_3_b1', 'lstmdecodercell_3_b2']
I try to read the keras source code but still canot locate the error reason,it is likely the backward and forward lstm has the same weight name ,so I just share the discover for discuss.
@ishalyminov @farizrahman4u
@JiaqingFu very useful, thanks!
I met the same problem and I used model.get_weights() and set_weights(weight) instead of save_weights(),load_weights(). It works for me.
file=h5py.File(fileName,'r')
weight = []
for i in range(len(file.keys())):
weight.append(file['weight'+str(i)][:])
model.set_weights(weight)
... ...
... ...
file = h5py.File(fileName,'w')
weight = model.get_weights()
for i in range(len(weight)):
file.create_dataset('weight'+str(i),data=weight[i])
file.close()
Same problem here. I was trying to train a CNN and while trying to use save_weights
or ModelCheckpoint
, got the same error.
I had a similar issue. One of my modules followed the naming convention 'modulename/layer_x'. Changing the '/' to '_' resolved the issue. I'm guessing Keras uses '/' as a delimiter and so any layer named with the above convention gets treated as having the same name.
Same problem here: #125 So far I've just found out that the error only arises if your AttentionSeq2Seq is bidirectional. same issue.did u find the solution
@v-chuqin OMG!!! You are so amazing. I was having this problem with TensorFlow Probability. It wouldn't save my weights. But using your technique, it works. The only modification: You have to call model.fit() and let it run for 1 epoch to initialize the weights. EG:
model.fit(X_train, y_train, epochs=1, verbose=2, batch_size=512, validation_data=(X_valid,y_valid), callbacks=[earlystopper])#,mc])
file=h5py.File('best_TFPROB.h5py','r')
weight = []
for i in range(len(file.keys())):
weight.append(file['weight'+str(i)][:])
model.set_weights(weight)
Thanks!!!!!!!!!!!!!
It's better to find out why your model has duplicate variable names.