VAE-Text-Generation icon indicating copy to clipboard operation
VAE-Text-Generation copied to clipboard

Training is very slow on Intel Xeon 2.3Ghz, CUDA not working

Open sharan21 opened this issue 3 years ago • 1 comments

I am using a google collab VM with an Nvidia K80 GPU and Intel Xeon CPU. On running python3 main.py, training is occurring but seems to be very slow. It has been 1 hour and 1 epoch doesn't seem to have been completed. Is that normal?

Also, on running nvidia-smi on CLI, it says that my GPU is not being utilised at al (0% utilization). This means the GPU isn't being used at all. I suspected that this was because the default gpu_device is set to 1 in main.py, but my gpu has an id of 0. After changing this parameter, I am getting an error during training.

I did not copy the error, but in essence, it is saying that all of the vectors are not present in the same device. I am assuming that this is because there is a bug in the code that did not send the vector to gpu with id = 0. I will paste this error as soon as I run in it again to replicate this problem.

So finally I have 2 questions, 1. is training on CPU meant to be this slow? 2. How do I change the gpu id to 0 without causing an error. Thanks in advance.

sharan21 avatar Mar 31 '21 18:03 sharan21