THUMT
THUMT copied to clipboard
about the time for train a model
how much time for train a model in single 32G GPU and use the defult parameters? I feel it is very slow, in my GPU , it is spend 3.7 seconds for one step. is it normal?
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
Sorry, my defult parametser are that advicing best parameters in UserManual.pdf . They are update_cycle=4,batch_size=6250. But I just followed your advice and set update_cycle=1,batch_size=4096, device_list=[0],it is still slow, each training step about 2.6 seconds. At this training, My GPU is a single Tesla P40 22G. I have checked this device index and it is available.but it didn't use GPU to training, whether the Tensorflow-version is wrong ? my Tenserflow-Version is tensorflow-gpu=1.15
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
Sorry, my defult parametser are that advicing best parameters in UserManual.pdf . They are update_cycle=4,batch_size=6250. But I just followed your advice and set update_cycle=1,batch_size=4096, device_list=[0],it is still slow, each training step about 2.6 seconds. At this training, My GPU is a single Tesla P40 22G. I have checked this device index and it is available.but it didn't use GPU to training, whether the Tensorflow-version is wrong ? my Tenserflow-Version is tensorflow-gpu=1.15
The THUMT-TensorFlow can be run with TensorFlow-gpu=1.15. You can run a simple Tensorflow-GPU program (maybe a matrix multiplication operation) to check whether it can use the GPU. If not, you should check the CUDA version and the Driver version to make sure they are matched.
@Rooders Please check whether the update_cycle is set to 1, if yes, then I think the training speed is abnormal. Usually, each training step is less than 1 second with the default parameters (model=Transformer,update_cycle=1,device_list=[0],batch_size=4096). The most possible reason is that your training program has run with the CPU rather than the GPU. Please make sure the device_list is set to the index of the GPU you are going to use.
Sorry, my defult parametser are that advicing best parameters in UserManual.pdf . They are update_cycle=4,batch_size=6250. But I just followed your advice and set update_cycle=1,batch_size=4096, device_list=[0],it is still slow, each training step about 2.6 seconds. At this training, My GPU is a single Tesla P40 22G. I have checked this device index and it is available.but it didn't use GPU to training, whether the Tensorflow-version is wrong ? my Tenserflow-Version is tensorflow-gpu=1.15
The THUMT-TensorFlow can be run with TensorFlow-gpu=1.15. You can run a simple Tensorflow-GPU program (maybe a matrix multiplication operation) to check whether it can use the GPU. If not, you should check the CUDA version and the Driver version to make sure they are matched.
thank u very mach, the issue have be solved, it is because CUDA version dosen't match Tensorflow version. By the way, if I set update_cycle=1,batch_size=4096, how many BLEU score I can get in valid corpus? and training model in zh-en 200 millions sentence-pair?
@Rooders Sorry, we did not record the BLEU scores under this setting.