lmql icon indicating copy to clipboard operation
lmql copied to clipboard

Torch 2.0 compile model

Open andrecharneca opened this issue 1 year ago • 3 comments

Are there any plans to add torch.compile speed-ups to LMQL Transformers models? Thanks

andrecharneca avatar Dec 01 '23 11:12 andrecharneca

Hi there Andre, can you recommend any resources on how torch.compile improves inference speed, with e.g. transformers.

In general I am definitely not opposed to adding it.

lbeurerkellner avatar Dec 10 '23 14:12 lbeurerkellner

For example: https://huggingface.co/docs/transformers/main/perf_torch_compile , although this is with Vision Transformers, results should be similar. After some experimentation with torch.compile on my own, for LLMs the compilation can take quite a while, so the gains in performance really depend on the specific use-case. Would be a nice feature to add still, since it's so simple.

andrecharneca avatar Dec 13 '23 13:12 andrecharneca

Marking this as a good first issue.

The feature can be added to https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/backends/transformers_model.py, where an optional lmql serve-model argument can be set, such that compilation is done before model serving begins.

lbeurerkellner avatar Feb 27 '24 14:02 lbeurerkellner