vggt icon indicating copy to clipboard operation
vggt copied to clipboard

Any workarounds for improving speed on V100 gpus?

Open ricshaw opened this issue 8 months ago • 3 comments

Doing inference on a V100 gpu is roughly 30x slower than what is reported on an H100 gpu (~30 seconds for 50 images on V100 vs ~1 second on H100). I guess this is mainly to do with flash attention no being supported and generally being an older architecture... Does anyone know any way we can improve speed performance on V100?

ricshaw avatar Apr 29 '25 09:04 ricshaw

Hi @ricshaw ,

This speed looks unreasonable to me. Can you check the discussion in this issue? It is quite possible that the ~30 seconds you mentioned above included the data loading time, which was determined by the IO.

https://github.com/facebookresearch/vggt/issues/21

jytime avatar Apr 29 '25 23:04 jytime

Thanks for your response! Using your timing script with the same example images I get: images [20, 3, 350, 518] = 1.9080 s images [60, 3, 350, 518] = 11.7003 s images [80, 3, 350, 518] = 19.7562 s

But with my images which get cropped and resized to 518x518, I get: images [20, 3, 518, 518] = 3.8108 s images [60, 3, 518, 518] = 26.1346 s images [80, 3, 518, 518] = 44.7516 s

ricshaw avatar Apr 29 '25 23:04 ricshaw

I see, this seems better but still quite slow. Unfortunately I do not have access to V100 so not sure how to accelerate it, but I guess someone has tried to find a way to use flash attention in such devices.

jytime avatar Apr 30 '25 00:04 jytime