language
language copied to clipboard
about orqa model running
hi i have changed the following command to running on GPU device
Training on TPU
MODEL_DIR=gs://<YOUR_BUCKET>/<ICT_MODEL_DIR>
TFHUB_CACHE_DIR=gs://<YOUR_BUCKET>/<TFHUB_CACHE_DIR>
TFHUB_CACHE_DIR=$TFHUB_CACHE_DIR
TPU_NAME=<NAME_OF_TPU>
python -m language.orqa.experiments.ict_experiment
--model_dir=$MODEL_DIR
--bert_hub_module_path=https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
--examples_path=gs://orqa-data/enwiki-20181220/examples.tfr
--save_checkpoints_steps=1000
--batch_size=4096
--num_train_steps=100000
--tpu_name=$TPU_NAME
--use_tpu=True
to like this :
python -m language.orqa.experiments.ict_experiment
--model_dir=$MODEL_DIR
--bert_hub_module_path=https://tfhub.dev/google/bert_uncased_L-12_H-768_A-12/1
--examples_path=gs://orqa-data/enwiki-20181220/examples.tfr
--save_checkpoints_steps=1000
--batch_size=4096
--num_train_steps=100000
--use_tpu=False
batchsize is about 256
but it seems too slow 100 steps needs 8409.323 sec i have 4 tesla v100 GPUS i donot know whether config right for this running .
Thank you
We haven't test this code on a multi-GPU setup. Are you sure it's using all available GPUs?
We haven't test this code on a multi-GPU setup. Are you sure it's using all available GPUs?
hi kentonl :
i have checked it ,it was running on cpus ,cause i havent install cudnn ,but i have test on GPUs ,it was just running on one GPU (i have 4 GPUs on one machine).i donot know how to config that to use all GPUS
Thank you