tensor2tensor icon indicating copy to clipboard operation
tensor2tensor copied to clipboard

Problem training Transformer_moe

Open Esaada opened this issue 6 years ago • 2 comments

Description

following the instruction and got this error: AttributeError: 'HParams' object has no attribute 'layer_types'

Environment information

OS: Linux Ubuntu 16.04
tensor2tensor==1.8.0
tensorboard==1.10.0
tensorflow==1.10.0
tensorflow-gpu==1.0.1
tensorpack==0.3.0


$ python -V
Python 2.7.12

Steps to reproduce:

I used those parameters and actions: PROBLEM=librispeech MODEL=transformer_moe HPARAMS=transformer_base_single_gpu DATA_DIR=./t2t_data TMP_DIR=/tmp/t2t_datagen TRAIN_DIR=./t2t_train/$PROBLEM/$MODEL-$HPARAMS

mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR

t2t-datagen
--data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=$PROBLEM

In the end, I used the "train" command: t2t-trainer
--data_dir=$DATA_DIR
--problem=$PROBLEM
--model=$MODEL
--hparams_set=$HPARAMS
--output_dir=$TRAIN_DIR

Error logs:

WARNING:tensorflow:Shapes are not fully defined. Assuming batch_size means tokens. INFO:tensorflow:Calling model_fn. INFO:tensorflow:Unsetting shared_embedding_and_softmax_weights. INFO:tensorflow:Setting T2TModel mode to 'train' INFO:tensorflow:Using variable initializer: uniform_unit_scaling INFO:tensorflow:Transforming feature 'inputs' with speech_recognition_modality.bottom INFO:tensorflow:Transforming 'targets' with symbol_modality_256_512.targets_bottom WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/function.py:986: calling create_op (from tensorflow.python.framework.ops) with compute_shapes is deprecated and will be removed in a future version. Instructions for updating: Shapes are always computed; don't use the compute_shapes as it has no effect. Traceback (most recent call last): File "/usr/local/bin/t2t-trainer", line 32, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 125, in run _sys.exit(main(argv)) File "/usr/local/bin/t2t-trainer", line 28, in main t2t_trainer.main(argv) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 385, in main execute_schedule(exp) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/bin/t2t_trainer.py", line 326, in execute_schedule getattr(exp, FLAGS.schedule)() File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/trainer_lib.py", line 331, in continuous_train_and_eval self._eval_spec) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 451, in train_and_evaluate return executor.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 590, in run return self.run_local() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/training.py", line 691, in run_local saving_listeners=saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 376, in train loss = self._train_model(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1145, in _train_model return self._train_model_default(input_fn, hooks, saving_listeners) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1170, in _train_model_default features, labels, model_fn_lib.ModeKeys.TRAIN, self.config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/estimator/estimator.py", line 1133, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 1184, in wrapping_model_fn decode_hparams=decode_hparams) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 1236, in estimator_model_fn logits, losses_dict = model(features) # pylint: disable=not-callable File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/layers/base.py", line 362, in call outputs = super(Layer, self).call(inputs, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/keras/engine/base_layer.py", line 736, in call outputs = self.call(inputs, *args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 190, in call sharded_logits, losses = self.model_fn_sharded(sharded_features) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/utils/t2t_model.py", line 216, in model_fn_sharded self._to_single_features_dict(transformed_features)) File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/research/transformer_moe.py", line 103, in body_sharded encoder_layers, decoder_layers = self._extract_layer_types() File "/usr/local/lib/python2.7/dist-packages/tensor2tensor/models/research/transformer_moe.py", line 222, in _extract_layer_types layer_types = hparams.layer_types AttributeError: 'HParams' object has no attribute 'layer_types'

Thanks!

Esaada avatar Oct 08 '18 04:10 Esaada

I think transformer_moe use quite different code base from transformer model. If you use hyper-parameters from transformer model code base, it will not contain some mandatory hyper-parameters needed in order to run transformer_moe.

As I read in source code, you will need to at least add some (unused) hyper-parameters like this:

  hparams = transformer.transformer_base_single_gpu()

  # Params below are required in order to have transformer_moe perform the same way as transformer
  hparams.layer_types = "a/a/a/a/a#a/a/a/a/a"
  hparams.default_att = "a"
  hparams.default_ff = "fc"

  # Params below may not be used, but need to be exist
  hparams.attention_loc_block_length = 256
  hparams.attention_loc_block_width = 128
  hparams.attention_red_factor = 3
  hparams.attention_red_type = "conv"
  hparams.attention_red_nonlinearity = "none"

Anyway, if you mean to use transformer_moe, then you probably should use hyper-parameters from transformer_moe, such as: transformer_moe_2k

twilightdema avatar Nov 08 '18 14:11 twilightdema

It seems that the fc layer of the moe type has not been implemented when i use hyper-parameters from transformer_moe, such as: transformer_moe_2k,

  with  following architecture:
  * No encoder.
    * Layer 0: a - sep  (self-attention - unmasked separable convolutions)
    * Layer 1: a - sep
    * Layer 2: a - sep
    * Layer 3: a - sep
    * Layer 4: a - sep
  * Decoder architecture:
    * Layer 0: a - a - sepm  (self-attention - enco/deco-attention - masked sep)
    * Layer 1: a - a - sepm
    * Layer 2: a - a - moe  (mixture of expert layers in the middle)
    * Layer 3: a - a - sepm
    * Layer 4: a - a - sepm

I get :

KeyError: "in converted code:\n relative to E:\\workspace\\nmt-train\\tensor2tensor:\n\n utils\\t2t_model.py:326 call\n sharded_logits, losses = self.model_fn_sharded(sharded_features)\n utils\\t2t_model.py:374 model_fn_sharded\n self._to_single_features_dict(transformed_features))\n models\\research\\transformer_moe.py:172 body_sharded\n x = prepostprocess(layers[ff_type])(\n\n KeyError: 'moe'\n"

Roshanson avatar Jul 30 '21 09:07 Roshanson