seq2seq icon indicating copy to clipboard operation
seq2seq copied to clipboard

When trying to use model.save_weights,get the h5py name already exist error

Open JiaqingFu opened this issue 8 years ago • 9 comments

The model I used is a simple seq2seq attention model, When I try to save the model weight after 5 epoch, I get the following error, It is likely the h5py create dataset with a existing name, Why this thing happen? Thanks in advance. @farizrahman4u model = AttentionSeq2Seq(input_dim=4096, input_length=5, hidden_dim=4096, output_length=5, output_dim=300, depth=(1,1))

    print "Number of stage", i
    if i >num:
        #model.fit(Sentenceseq, Imageseq, batch_size=100, nb_epoch=5,validation_data=(val_Sentenceseq,val_Imageseq),shuffle=True)
        model.fit(Imageseq,Sentenceseq,  batch_size=100, nb_epoch=5, validation_data=(val_Imageseq,val_Sentenceseq),
                  shuffle=True)
        print "Checkpoint saved"
        model.save_weights('./model_seq2seqAttention/rcn_'+str(i)+'.hdf5')```

Train on 38064 samples, validate on 4798 samples
Epoch 1/5
38064/38064 [==============================] - 2555s - loss: 7042.3301 - acc: 0.0050 - val_loss: 9680.7603 - val_acc: 0.0072
Epoch 2/5
38064/38064 [==============================] - 2561s - loss: 5921.5143 - acc: 0.0042 - val_loss: 9544.5769 - val_acc: 0.0042
Epoch 3/5
38064/38064 [==============================] - 2573s - loss: 5748.3671 - acc: 0.0043 - val_loss: 9588.1486 - val_acc: 0.0035
Epoch 4/5
38064/38064 [==============================] - 2570s - loss: 5782.3526 - acc: 0.0063 - val_loss: 9661.4667 - val_acc: 0.0038
Epoch 5/5
38064/38064 [==============================] - 2568s - loss: 5550.6135 - acc: 0.0041 - val_loss: 9390.9714 - val_acc: 0.0045
Checkpoint saved

`RuntimeError                              Traceback (most recent call last)
<ipython-input-1-85bf116cee1d> in <module>()
    107                   shuffle=True)
    108         print "Checkpoint saved"
--> 109         model.save_weights('./model_seq2seqAttention/rcn_'+str(i)+'.hdf5')

/usr/local/lib/python2.7/dist-packages/keras/engine/topology.pyc in save_weights(self, filepath, overwrite)
   2446                 return
   2447         f = h5py.File(filepath, 'w')
-> 2448         self.save_weights_to_hdf5_group(f)
   2449         f.flush()
   2450         f.close()

/usr/local/lib/python2.7/dist-packages/keras/engine/topology.pyc in save_weights_to_hdf5_group(self, f)
   2473             for name, val in zip(weight_names, weight_values):
   2474                 param_dset = g.create_dataset(name, val.shape,
-> 2475                                               dtype=val.dtype)
   2476                 if not val.shape:
   2477                     # scalar

/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.pyc in create_dataset(self, name, shape, dtype, data, **kwds)
    104             dset = dataset.Dataset(dsid)
    105             if name is not None:
--> 106                 self[name] = dset
    107             return dset
    108 

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2574)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2533)()

/usr/local/lib/python2.7/dist-packages/h5py/_hl/group.pyc in __setitem__(self, name, obj)
    266 
    267         if isinstance(obj, HLObject):
--> 268             h5o.link(obj.id, self.id, name, lcpl=lcpl, lapl=self._lapl)
    269 
    270         elif isinstance(obj, SoftLink):

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2574)()

h5py/_objects.pyx in h5py._objects.with_phil.wrapper (/tmp/pip-build-zgFvsS/h5py/h5py/_objects.c:2533)()

h5py/h5o.pyx in h5py.h5o.link (/tmp/pip-build-zgFvsS/h5py/h5py/h5o.c:3713)()

RuntimeError: Unable to create link (Name already exists)`

JiaqingFu avatar Nov 08 '16 07:11 JiaqingFu

Same problem here: https://github.com/farizrahman4u/seq2seq/issues/125 So far I've just found out that the error only arises if your AttentionSeq2Seq is bidirectional.

ishalyminov avatar Nov 08 '16 10:11 ishalyminov

I just print the name that h5py create(line 2473 of keras/engine/topology.py) ,when it is bidirectional=True, I get the following,there is surely some repeated name ,such as two lstmcell_1_W: ['lstmcell_1_W', 'lstmcell_1_U', 'lstmcell_1_b', 'lstmcell_2_W', 'lstmcell_2_U', 'lstmcell_2_b', 'lstmcell_3_W', 'lstmcell_1_W', 'lstmcell_1_U', 'lstmcell_1_b', 'lstmcell_2_W', 'lstmcell_2_U', 'lstmcell_2_b', 'lstmcell_3_W'] instead when bidirectional=False, I get the following: ['lstmcell_1_W', 'lstmcell_1_U', 'lstmcell_1_b', 'lstmcell_2_W', 'lstmcell_2_U', 'lstmcell_2_b', 'lstmcell_3_W', 'lstmcell_3_U', 'lstmcell_3_b', 'lstmcell_4_W', 'lstmcell_4_U', 'lstmcell_4_b'] ['attentiondecodercell_1_W1', 'attentiondecodercell_1_W2', 'attentiondecodercell_1_W3', 'attentiondecodercell_1_U', 'attentiondecodercell_1_b1', 'attentiondecodercell_1_b2', 'attentiondecodercell_1_b3', 'lstmdecodercell_1_W1', 'lstmdecodercell_1_W2', 'lstmdecodercell_1_U', 'lstmdecodercell_1_b1', 'lstmdecodercell_1_b2', 'lstmdecodercell_2_W1', 'lstmdecodercell_2_W2', 'lstmdecodercell_2_U', 'lstmdecodercell_2_b1', 'lstmdecodercell_2_b2', 'lstmdecodercell_3_W1', 'lstmdecodercell_3_W2', 'lstmdecodercell_3_U', 'lstmdecodercell_3_b1', 'lstmdecodercell_3_b2'] I try to read the keras source code but still canot locate the error reason,it is likely the backward and forward lstm has the same weight name ,so I just share the discover for discuss. @ishalyminov @farizrahman4u

JiaqingFu avatar Nov 08 '16 15:11 JiaqingFu

@JiaqingFu very useful, thanks!

ishalyminov avatar Nov 08 '16 15:11 ishalyminov

I met the same problem and I used model.get_weights() and set_weights(weight) instead of save_weights(),load_weights(). It works for me.

file=h5py.File(fileName,'r')
weight = []
for i in range(len(file.keys())):
    weight.append(file['weight'+str(i)][:])
model.set_weights(weight)
... ...
... ...
file = h5py.File(fileName,'w')
weight = model.get_weights()
for i in range(len(weight)):
    file.create_dataset('weight'+str(i),data=weight[i])
file.close()

v-chuqin avatar Nov 16 '16 13:11 v-chuqin

Same problem here. I was trying to train a CNN and while trying to use save_weights or ModelCheckpoint, got the same error.

AakashKumarNain avatar Sep 12 '17 19:09 AakashKumarNain

I had a similar issue. One of my modules followed the naming convention 'modulename/layer_x'. Changing the '/' to '_' resolved the issue. I'm guessing Keras uses '/' as a delimiter and so any layer named with the above convention gets treated as having the same name.

munsanje avatar Jan 28 '18 00:01 munsanje

Same problem here: #125 So far I've just found out that the error only arises if your AttentionSeq2Seq is bidirectional. same issue.did u find the solution

Belginwls avatar Apr 23 '19 05:04 Belginwls

@v-chuqin OMG!!! You are so amazing. I was having this problem with TensorFlow Probability. It wouldn't save my weights. But using your technique, it works. The only modification: You have to call model.fit() and let it run for 1 epoch to initialize the weights. EG:

model.fit(X_train, y_train, epochs=1, verbose=2, batch_size=512, validation_data=(X_valid,y_valid), callbacks=[earlystopper])#,mc])
file=h5py.File('best_TFPROB.h5py','r')
weight = []
for i in range(len(file.keys())):
    weight.append(file['weight'+str(i)][:])
model.set_weights(weight)

Thanks!!!!!!!!!!!!!

Quetzalcohuatl avatar Dec 09 '19 21:12 Quetzalcohuatl

It's better to find out why your model has duplicate variable names.

candlewill avatar Dec 23 '20 12:12 candlewill