flux icon indicating copy to clipboard operation
flux copied to clipboard

Add torch.compile option to CLI

Open yf225 opened this issue 8 months ago • 0 comments

Currently, the inference optimization solution (TensorRT) takes a very long time to compile and the UX is not great, and it doesn't support LoRA.

Compared to TensorRT, torch.compile has a big advantage:

  1. It compiles relatively fast (100 secs).
  2. It provides 60% speedup vs. eager mode (measured on H100, other gpus should have a lot of speedups too).
  3. It supports LoRA (or any other kinds of model changes that people want to make).

We should encourage users to prefer torch.compile over TensorRT, to get the compile time and LoRA integration benefits.

yf225 avatar Mar 17 '25 23:03 yf225