gpt-fast
                                
                                 gpt-fast copied to clipboard
                                
                                    gpt-fast copied to clipboard
                            
                            
                            
                        How to cache the compilation result?
torch.compile always re-compiles a function from scratch in a new Python session, which takes a lot of time.
I'm wondering if there's a way to cache the compilation result in the file system (like gcc/clang) to speed up the development & debugging process.
@Chillee
https://github.com/pytorch-labs/gpt-fast/blob/db7b273ab86b75358bd3b014f1f022a19aba4797/generate.py#L16-L18
This is currently an issue we're aware of, unfortunately. In theory, it's possible to use AOTInductor https://www.youtube.com/watch?v=w7d4oWzwZ0c to completely AOT compile everything, however it's somewhat finicky to use.
We also have some plans to offer an easier way to cache compilation results.
To be clear, a number of components should already be cached on recompile - triton autotuning decisions, inductor compilation, etc. It typically takes me on the order of 30-40 seconds for a warm recompile, although we should certainly try to drive this down even further.
thanks for reply.