Orion
Orion copied to clipboard
vLLM support?
The docs mention that you used vLLM for inferencing, but it looks like Orion support hasn't been upstreamed yet: https://github.com/vllm-project/vllm/tree/main/vllm/model_executor/models
Can you share the model file or do you have an ETA for upstreaming the code? HF transformers inferencing is slow enough to make Orion pretty unusable even for running evals.