lmql
lmql copied to clipboard
Torch 2.0 compile model
Are there any plans to add torch.compile speed-ups to LMQL Transformers models? Thanks
Hi there Andre, can you recommend any resources on how torch.compile improves inference speed, with e.g. transformers.
In general I am definitely not opposed to adding it.
For example: https://huggingface.co/docs/transformers/main/perf_torch_compile , although this is with Vision Transformers, results should be similar. After some experimentation with torch.compile on my own, for LLMs the compilation can take quite a while, so the gains in performance really depend on the specific use-case. Would be a nice feature to add still, since it's so simple.
Marking this as a good first issue.
The feature can be added to https://github.com/eth-sri/lmql/blob/main/src/lmql/models/lmtp/backends/transformers_model.py, where an optional lmql serve-model argument can be set, such that compilation is done before model serving begins.