hierarchical-attention-networks
hierarchical-attention-networks copied to clipboard
ValueError in running worker.py
Sorry to bother you again. I used the tensorflow=1.2.1, python=3.6, run worker.py as your instructions, but it encountered an error. ValueError: Trying to share variable tcm/word/fw/multi_rnn_cell/cell_0/bn_lstm/w_xh, but specified shape(100,320) and found shpae(200,320).
see same error when I trying to run the model, exactly same number. Is this related to TF version?
i just downgraded my tensorflow version to 1.1 i found that the implementation of bidirectional_dynamic_rnn has been changed in 1.2
Ok, I'll fix it next week (I stopped using tensorflow around 1.0).
I still encounter the same error on tensorflow 1.4.1. Is there a quick way to fix this?
Didn’t have any time to look at it, sorry! Would be happy to merge your PR, I suspect its something trivial.
Has there been a successful fix on this? I am also running into the same issue on tensorflow 1.5.0
ValueError: Trying to share variable tcm/word/fw/multi_rnn_cell/cell_0/bn_lstm/W_xh, but specified shape (100, 320) and found shape (200, 320).
I encountered this ValueError too.
Environment:
- tensorflow(1.6.0),
- Python 3.6.4.
exactly same error here Environment: tensorflow(1.6.0), Python 3.6.2.
Based on my understanding of this issue in stackoverflow, I modified the code so that the the 'cell's passed as argument to HANClassifierModel in sentence_cell and word_cell become functions returning a cell. Then I called them when instanciating the bidirectional_rnn for sentence level and word level. In fact, each of the 2 cells necessary for the 2 bidirectional RNNs has to be instanciated as a different cell. So the changes are :
# Defined cell entries as functions
def cell_maker():
cell = BNLSTMCell(80, is_training) # h-h batchnorm LSTMCell
# cell = GRUCell(30)
return MultiRNNCell([cell]*5)`
model = HANClassifierModel(
vocab_size=vocab_size,
embedding_size=200,
classes=classes,
word_cell=cell_maker, # put the function as a cell entry ( without calling it)
sentence_cell=cell_maker, # put the function as a cell entry ( without calling it)
word_output_size=100,
sentence_output_size=100,
device=args.device,
learning_rate=args.lr,
max_grad_norm=args.max_grad_norm,
dropout_keep_proba=0.5,
is_training=is_training,)
then
word_encoder_output, _ = bidirectional_rnn(
self.word_cell(), self.word_cell(), # called the function twice here
word_level_inputs, word_level_lengths,
scope=scope)
and
sentence_encoder_output, _ = bidirectional_rnn(
self.sentence_cell(), self.sentence_cell(), # called the function twice here
sentence_inputs, self.sentence_lengths, scope=scope)
It runs now for me, but I can't confirm the performances, since I don't have GPU to make a complete train/test process. Can anyone try it ?
@Sora77 - How much time did it took you to complete 1 epoch of training on CPU? @Others - How much time did it took you guys to train on CPU/GPU and what hardware resources were you using? For me currently it takes around 20 sec per iteration on GeForce GTX 970 4GB. Which I feel is pretty slow. Just wanted to have some idea what kind of speed gain i can expect with a better hardware.
I don't have the logs anymore, but I remember it was more or less the same speed that you have. Some implementations do not properly leverage GPUs power. I advice you to replace you Tensorflow-gpu library with Tensorflow and see for yourself. If it's the same try to look for the bottleneck in the code.
Thanks @Sora77 for the quick response and your suggestions :)