hierarchical-attention-networks icon indicating copy to clipboard operation
hierarchical-attention-networks copied to clipboard

ValueError in running worker.py

Open jianzhengming opened this issue 7 years ago • 12 comments

Sorry to bother you again. I used the tensorflow=1.2.1, python=3.6, run worker.py as your instructions, but it encountered an error. ValueError: Trying to share variable tcm/word/fw/multi_rnn_cell/cell_0/bn_lstm/w_xh, but specified shape(100,320) and found shpae(200,320).

jianzhengming avatar Aug 21 '17 04:08 jianzhengming

see same error when I trying to run the model, exactly same number. Is this related to TF version?

wzds2015 avatar Sep 26 '17 07:09 wzds2015

i just downgraded my tensorflow version to 1.1 i found that the implementation of bidirectional_dynamic_rnn has been changed in 1.2

iloveddobboki avatar Nov 05 '17 04:11 iloveddobboki

Ok, I'll fix it next week (I stopped using tensorflow around 1.0).

ematvey avatar Nov 05 '17 09:11 ematvey

I still encounter the same error on tensorflow 1.4.1. Is there a quick way to fix this?

hendriksc avatar Jan 24 '18 16:01 hendriksc

Didn’t have any time to look at it, sorry! Would be happy to merge your PR, I suspect its something trivial.

ematvey avatar Jan 24 '18 18:01 ematvey

Has there been a successful fix on this? I am also running into the same issue on tensorflow 1.5.0

longvtran avatar Mar 05 '18 17:03 longvtran

ValueError: Trying to share variable tcm/word/fw/multi_rnn_cell/cell_0/bn_lstm/W_xh, but specified shape (100, 320) and found shape (200, 320).

I encountered this ValueError too.

Environment:

  • tensorflow(1.6.0),
  • Python 3.6.4.

HearyShen avatar Mar 09 '18 12:03 HearyShen

exactly same error here Environment: tensorflow(1.6.0), Python 3.6.2.

acadTags avatar Mar 21 '18 15:03 acadTags

Based on my understanding of this issue in stackoverflow, I modified the code so that the the 'cell's passed as argument to HANClassifierModel in sentence_cell and word_cell become functions returning a cell. Then I called them when instanciating the bidirectional_rnn for sentence level and word level. In fact, each of the 2 cells necessary for the 2 bidirectional RNNs has to be instanciated as a different cell. So the changes are :

# Defined cell entries as functions
def cell_maker():
    cell = BNLSTMCell(80, is_training) # h-h batchnorm LSTMCell
    # cell = GRUCell(30)
    return MultiRNNCell([cell]*5)`

  model = HANClassifierModel(
      vocab_size=vocab_size,
      embedding_size=200,
      classes=classes,
      word_cell=cell_maker,   # put the function as a cell entry ( without calling it)
      sentence_cell=cell_maker,   # put the function as a cell entry ( without calling it)
      word_output_size=100,
      sentence_output_size=100,
      device=args.device,
      learning_rate=args.lr,
      max_grad_norm=args.max_grad_norm,
      dropout_keep_proba=0.5,
      is_training=is_training,)

then

word_encoder_output, _ = bidirectional_rnn(
          self.word_cell(), self.word_cell(),  # called the function twice here 
          word_level_inputs, word_level_lengths,
          scope=scope)

and

sentence_encoder_output, _ = bidirectional_rnn(
          self.sentence_cell(), self.sentence_cell(),   # called the function twice here 
 sentence_inputs, self.sentence_lengths, scope=scope)

It runs now for me, but I can't confirm the performances, since I don't have GPU to make a complete train/test process. Can anyone try it ?

ghazi-f avatar Apr 20 '18 13:04 ghazi-f

@Sora77 - How much time did it took you to complete 1 epoch of training on CPU? @Others - How much time did it took you guys to train on CPU/GPU and what hardware resources were you using? For me currently it takes around 20 sec per iteration on GeForce GTX 970 4GB. Which I feel is pretty slow. Just wanted to have some idea what kind of speed gain i can expect with a better hardware.

dugarsumit avatar Aug 28 '18 06:08 dugarsumit

I don't have the logs anymore, but I remember it was more or less the same speed that you have. Some implementations do not properly leverage GPUs power. I advice you to replace you Tensorflow-gpu library with Tensorflow and see for yourself. If it's the same try to look for the bottleneck in the code.

ghazi-f avatar Aug 28 '18 06:08 ghazi-f

Thanks @Sora77 for the quick response and your suggestions :)

dugarsumit avatar Aug 28 '18 06:08 dugarsumit