tensor2tensor
tensor2tensor copied to clipboard
Problem training Transformer_moe
Description
following the instruction and got this error: AttributeError: 'HParams' object has no attribute 'layer_types'
Environment information
OS: Linux Ubuntu 16.04
tensor2tensor==1.8.0
tensorboard==1.10.0
tensorflow==1.10.0
tensorflow-gpu==1.0.1
tensorpack==0.3.0
$ python -V
Python 2.7.12
Steps to reproduce:
I used those parameters and actions: PROBLEM=librispeech MODEL=transformer_moe HPARAMS=transformer_base_single_gpu DATA_DIR=./t2t_data TMP_DIR=/tmp/t2t_datagen TRAIN_DIR=./t2t_train/$PROBLEM/$MODEL-$HPARAMS
mkdir -p $DATA_DIR $TMP_DIR $TRAIN_DIR
t2t-datagen
--data_dir=$DATA_DIR
--tmp_dir=$TMP_DIR
--problem=$PROBLEM
In the end, I used the "train" command:
t2t-trainer
--data_dir=$DATA_DIR
--problem=$PROBLEM
--model=$MODEL
--hparams_set=$HPARAMS
--output_dir=$TRAIN_DIR
Error logs:
WARNING:tensorflow:Shapes are not fully defined. Assuming batch_size means tokens.
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Unsetting shared_embedding_and_softmax_weights.
INFO:tensorflow:Setting T2TModel mode to 'train'
INFO:tensorflow:Using variable initializer: uniform_unit_scaling
INFO:tensorflow:Transforming feature 'inputs' with speech_recognition_modality.bottom
INFO:tensorflow:Transforming 'targets' with symbol_modality_256_512.targets_bottom
WARNING:tensorflow:From /usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/function.py:986: calling create_op (from tensorflow.python.framework.ops) with compute_shapes is deprecated and will be removed in a future version.
Instructions for updating:
Shapes are always computed; don't use the compute_shapes as it has no effect.
Traceback (most recent call last):
File "/usr/local/bin/t2t-trainer", line 32, in
Thanks!
I think transformer_moe use quite different code base from transformer model. If you use hyper-parameters from transformer model code base, it will not contain some mandatory hyper-parameters needed in order to run transformer_moe.
As I read in source code, you will need to at least add some (unused) hyper-parameters like this:
hparams = transformer.transformer_base_single_gpu()
# Params below are required in order to have transformer_moe perform the same way as transformer
hparams.layer_types = "a/a/a/a/a#a/a/a/a/a"
hparams.default_att = "a"
hparams.default_ff = "fc"
# Params below may not be used, but need to be exist
hparams.attention_loc_block_length = 256
hparams.attention_loc_block_width = 128
hparams.attention_red_factor = 3
hparams.attention_red_type = "conv"
hparams.attention_red_nonlinearity = "none"
Anyway, if you mean to use transformer_moe, then you probably should use hyper-parameters from transformer_moe, such as: transformer_moe_2k
It seems that the fc layer of the moe type has not been implemented when i use hyper-parameters from transformer_moe, such as: transformer_moe_2k,
with following architecture:
* No encoder.
* Layer 0: a - sep (self-attention - unmasked separable convolutions)
* Layer 1: a - sep
* Layer 2: a - sep
* Layer 3: a - sep
* Layer 4: a - sep
* Decoder architecture:
* Layer 0: a - a - sepm (self-attention - enco/deco-attention - masked sep)
* Layer 1: a - a - sepm
* Layer 2: a - a - moe (mixture of expert layers in the middle)
* Layer 3: a - a - sepm
* Layer 4: a - a - sepm
I get :
KeyError: "in converted code:\n relative to E:\\workspace\\nmt-train\\tensor2tensor:\n\n utils\\t2t_model.py:326 call\n sharded_logits, losses = self.model_fn_sharded(sharded_features)\n utils\\t2t_model.py:374 model_fn_sharded\n self._to_single_features_dict(transformed_features))\n models\\research\\transformer_moe.py:172 body_sharded\n x = prepostprocess(layers[ff_type])(\n\n KeyError: 'moe'\n"