vllm
vllm copied to clipboard
Custom base path on FastAPI server
Hello everyone,
We have a situation here in which we are deploying our LLMs behind a ALB on AWS using path prefix, in this case it is deployed on a path that looks like my.domain.com/my-model
. By default ALBs on AWS doesn't support path rewrite so it will forward the request to the API server including the /my-model
on it, breaking the API because this URL doesn't exist there given that the API only listens in the /generate
path. I implemented a custom API server here, but I think we could add an optional parameter (or ENV VAR) that we can use to set the base path of the server (or keep the default behavior if it is not there). I can open a PR doing this, but I wanted to confirm if this is something that makes sense for other people before I actually do this.
any update here?
FastAPI supports this through the --root-path
parameter, but it'd have to be exposed via fastAPI.
https://fastapi.tiangolo.com/advanced/behind-a-proxy/
Rand into the same issue. Here's how I run a similar fastAPI service behind ALB
command: [ "uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000", "--root-path", "/my-model" ]
Would need to get this parameter propogated intot he FastAPI server.
I'm considering just biting the bullet and standing up a second loadbalancer just to serve VLLM because AWS sucks that much
@hughesadam87 I've been using kubernetes so I just changed my route to be my-model.mydomain.com
instead of api.mydomain.com/my-model
. No need to create another ALB. But it is a pain in the ass because I didn't want to have one full domain for each model. Another option is to do what I did and to create a custom python server that is identical to the one used by vLLM and just change the routes to have your prefix, it worked too.
Ideally we could just have an env var on VLLM that adds the prefix to the routes when set