flux
flux copied to clipboard
Add torch.compile option to CLI
Currently, the inference optimization solution (TensorRT) takes a very long time to compile and the UX is not great, and it doesn't support LoRA.
Compared to TensorRT, torch.compile has a big advantage:
- It compiles relatively fast (100 secs).
- It provides 60% speedup vs. eager mode (measured on H100, other gpus should have a lot of speedups too).
- It supports LoRA (or any other kinds of model changes that people want to make).
We should encourage users to prefer torch.compile over TensorRT, to get the compile time and LoRA integration benefits.