faster-pytorch-blog Update 4

Haven't ran this yet but included 2 important caveats that will have a big performance implication - will run this next and see what else we can debug

Mar 17 '23 16:03 msaroufim

Awesome, thanks for jumping in here. Would love to get some insights wrt to how to improve that. I should mentioned, I used CUDA 11.8.

Let me try the sample batch idea!

Mar 17 '23 16:03 rasbt

Ah your batch size is also quite small so might be best to try out torch.compile(m, mode="reduce-overhead") which will automatically enable cuda graphs for you

Recently added some docs to make most of this clearer https://pytorch.org/docs/master/compile/index.html

Mar 17 '23 17:03 msaroufim

Ah your batch size is also quite small so might be best to try out torch.compile(m, mode="reduce-overhead") which will automatically enable cuda graphs for you

Recently added some docs to make most of this clearer https://pytorch.org/docs/master/compile/index.html

Thanks, I tried that originally and it didn't really help :(.

Mar 17 '23 17:03 rasbt

Yeah should be in combination with tensor cores

Mar 17 '23 17:03 msaroufim

Update 4_compile.py