pyg_autoscale icon indicating copy to clipboard operation
pyg_autoscale copied to clipboard

pytorch cuda streams parallel

Open wawltor opened this issue 4 years ago • 3 comments
trafficstars

Hi, It is a great job for the big GNN training, thank you for your job. I have a question, It is seems that the cuda streams could not parallel in the pytorch, like issue https://github.com/pytorch/pytorch/issues/25540, is there some tricks in PygGas?

wawltor avatar Jul 22 '21 07:07 wawltor

The usage of CUDA streams can parallelize memory device transfers (via *.to(device, non_blocking=True) and actual GPU kernel execution. That's exactly what we are using it for.

From what I know, it doesn't seem possible to parallelize GPU kernel executions using multipe CUDA streams in PyTorch, at least I didn't have any success yet in doing so.

rusty1s avatar Jul 22 '21 15:07 rusty1s

图片 @rusty1s is there some problem in the code? The dst tensor is cuda memory, src tensor is cpu memory,the destination is cudaMemcpyDeviceToHost.

wawltor avatar Jul 30 '21 03:07 wawltor

Thanks for reporting. This is indeed wrong and I fixed it. Luckily, I confirmed that it has been working successfully anyway.

rusty1s avatar Jul 30 '21 07:07 rusty1s