tensor2tensor icon indicating copy to clipboard operation
tensor2tensor copied to clipboard

T2T 1.15.7 version with Tensorflow 2.2 t2t-trainer produces additional model weights if trained on more that 1 GPU

Open assij opened this issue 5 years ago • 0 comments

Description

When working with t2t 1.15.7 on tensorflow 2.2 and performing training on 1 GPU the model weights are ~211M, but when we increase the # of GPUs the model weights increases to around 378M with 2 GPUs till 1.4G with 8 GPUs.

...

Environment information

OS: <your answer here>
Ubuntu 18.04.4 LTS

$ pip freeze | grep tensor
tensorboard==2.2.2
tensorboard-plugin-wit==1.7.0
tensorflow-addons==0.11.2
tensorflow-datasets==2.1.0
tensorflow-estimator==2.2.0
tensorflow-gan==2.0.0
tensorflow-gpu==2.2.0
tensorflow-hub==0.9.0
tensorflow-metadata==0.23.0
tensorflow-probability==0.7.0

$ python -V
Python 3.6.10 :: Anaconda, Inc

For bugs: reproduction and error logs

# Steps to reproduce:
PROBLEM=translate_ende_wmt32k
MODEL=transformer
HPARAMS=transformer_big
DATA_DIR=$PWD/t2t_data
TMP_DIR=/$PWD/t2t_datagen
TRAIN_DIR=$PWD/t2t_train/$PROBLEM/$MODEL-$HPARAMS
BEAM_SIZE=4
ALPHA=0.6
 
 
export PYTHONPATH=${PWD}:$PYTHONPATH
 
 
python3 t2t-trainer --data_dir=$DATA_DIR --problem=$PROBLEM --model=$MODEL --hparams_set=$HPARAMS --output_dir=$TRAIN_DIR/bs3300 --hparams='batch_size=3300' --worker_gpu=8 
--keep_checkpoint_max=20 --local_eval_frequency=1000  --train_steps=1000000 --eval_throttle_seconds=3600

# Error logs:
...

assij avatar Sep 14 '20 05:09 assij