faster-pytorch-blog icon indicating copy to clipboard operation
faster-pytorch-blog copied to clipboard

Update 4_compile.py

Open msaroufim opened this issue 2 years ago • 4 comments

Haven't ran this yet but included 2 important caveats that will have a big performance implication - will run this next and see what else we can debug

msaroufim avatar Mar 17 '23 16:03 msaroufim

Awesome, thanks for jumping in here. Would love to get some insights wrt to how to improve that. I should mentioned, I used CUDA 11.8.

Let me try the sample batch idea!

rasbt avatar Mar 17 '23 16:03 rasbt

Ah your batch size is also quite small so might be best to try out torch.compile(m, mode="reduce-overhead") which will automatically enable cuda graphs for you

Recently added some docs to make most of this clearer https://pytorch.org/docs/master/compile/index.html

msaroufim avatar Mar 17 '23 17:03 msaroufim

Ah your batch size is also quite small so might be best to try out torch.compile(m, mode="reduce-overhead") which will automatically enable cuda graphs for you

Recently added some docs to make most of this clearer https://pytorch.org/docs/master/compile/index.html

Thanks, I tried that originally and it didn't really help :(.

rasbt avatar Mar 17 '23 17:03 rasbt

Yeah should be in combination with tensor cores

msaroufim avatar Mar 17 '23 17:03 msaroufim