language about orqa model running

about orqa model running

Open paulrich1234 opened this issue 4 years ago • 2 comments

hi i have changed the following command to running on GPU device

Training on TPU MODEL_DIR=gs://<YOUR_BUCKET>/<ICT_MODEL_DIR> TFHUB_CACHE_DIR=gs://<YOUR_BUCKET>/<TFHUB_CACHE_DIR> TFHUB_CACHE_DIR=$TFHUB_CACHE_DIR
TPU_NAME=<NAME_OF_TPU> python -m language.orqa.experiments.ict_experiment
--model_dir=$MODEL_DIR
--bert_hub_module_path=https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
--examples_path=gs://orqa-data/enwiki-20181220/examples.tfr
--save_checkpoints_steps=1000
--batch_size=4096
--num_train_steps=100000
--tpu_name=$TPU_NAME
--use_tpu=True

to like this : python -m language.orqa.experiments.ict_experiment
--model_dir=$MODEL_DIR
--bert_hub_module_path=https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
--examples_path=gs://orqa-data/enwiki-20181220/examples.tfr
--save_checkpoints_steps=1000
--batch_size=4096
--num_train_steps=100000
--use_tpu=False

batchsize is about 256
but it seems too slow 100 steps needs 8409.323 sec i have 4 tesla v100 GPUS i donot know whether config right for this running .

Thank you

Sep 23 '20 06:09 paulrich1234

We haven't test this code on a multi-GPU setup. Are you sure it's using all available GPUs?

Sep 23 '20 15:09 kentonl

We haven't test this code on a multi-GPU setup. Are you sure it's using all available GPUs?

hi kentonl :
i have checked it ,it was running on cpus ,cause i havent install cudnn ,but i have test on GPUs ,it was just running on one GPU (i have 4 GPUs on one machine).i donot know how to config that to use all GPUS
Thank you

Sep 24 '20 00:09 paulrich1234

language language copied to clipboard

about orqa model running

language
language copied to clipboard