Any workarounds for improving speed on V100 gpus?
Doing inference on a V100 gpu is roughly 30x slower than what is reported on an H100 gpu (~30 seconds for 50 images on V100 vs ~1 second on H100). I guess this is mainly to do with flash attention no being supported and generally being an older architecture... Does anyone know any way we can improve speed performance on V100?
Hi @ricshaw ,
This speed looks unreasonable to me. Can you check the discussion in this issue? It is quite possible that the ~30 seconds you mentioned above included the data loading time, which was determined by the IO.
https://github.com/facebookresearch/vggt/issues/21
Thanks for your response! Using your timing script with the same example images I get: images [20, 3, 350, 518] = 1.9080 s images [60, 3, 350, 518] = 11.7003 s images [80, 3, 350, 518] = 19.7562 s
But with my images which get cropped and resized to 518x518, I get: images [20, 3, 518, 518] = 3.8108 s images [60, 3, 518, 518] = 26.1346 s images [80, 3, 518, 518] = 44.7516 s
I see, this seems better but still quite slow. Unfortunately I do not have access to V100 so not sure how to accelerate it, but I guess someone has tried to find a way to use flash attention in such devices.