gpt-fast How to cache the compilation result?

How to cache the compilation result?

Open huntzhan opened this issue 1 year ago • 2 comments

torch.compile always re-compiles a function from scratch in a new Python session, which takes a lot of time. I'm wondering if there's a way to cache the compilation result in the file system (like gcc/clang) to speed up the development & debugging process. @Chillee

https://github.com/pytorch-labs/gpt-fast/blob/db7b273ab86b75358bd3b014f1f022a19aba4797/generate.py#L16-L18

Dec 12 '23 09:12 huntzhan

This is currently an issue we're aware of, unfortunately. In theory, it's possible to use AOTInductor https://www.youtube.com/watch?v=w7d4oWzwZ0c to completely AOT compile everything, however it's somewhat finicky to use.

We also have some plans to offer an easier way to cache compilation results.

To be clear, a number of components should already be cached on recompile - triton autotuning decisions, inductor compilation, etc. It typically takes me on the order of 30-40 seconds for a warm recompile, although we should certainly try to drive this down even further.

Dec 17 '23 01:12 Chillee

thanks for reply.

Dec 18 '23 14:12 huntzhan

gpt-fast gpt-fast copied to clipboard

How to cache the compilation result?

gpt-fast
gpt-fast copied to clipboard