infinity icon indicating copy to clipboard operation
infinity copied to clipboard

Multimodel inference

Open bacoco opened this issue 2 years ago • 4 comments

Will be great if we could load more models in the same container and switch between them using model name

bacoco avatar Oct 23 '23 14:10 bacoco

Great suggestion. I already thought about this, but it makes the server harder to monitor / measure throughput etc, especially the /ready and /metrics routes.

For now, I would suggest starting up two servers at the same time using /bin/sh -c .. logic.

Docker

For Docker you can do e.g.

# Dockerfile for multiple models via multiple ports
FROM michaelf34/infinity:latest
ENTRYPOINT ["/bin/sh", "-c", \
 "(/opt/poetry/bin/poetry run infinity_emb --port 8080 --model-name-or-path sentence-transformers/all-MiniLM-L6-v2 &);\
 (/opt/poetry/bin/poetry run infinity_emb --port 8081 --model-name-or-path intfloat/e5-large-v2 )"]

You can run it via

docker build -t custominfinity . && docker run -it -p 8080:8080 -p 8081:8081 custominfinity

Dont forget to add GPUs / cache dir etc.

michaelfeil avatar Oct 23 '23 16:10 michaelfeil

FYI, there is now a high level python API. You can build now your own FastAPI server, according to your needs.

michaelfeil avatar Dec 18 '23 11:12 michaelfeil

@bacoco Taking suggestions for design patterns for this feature. If there is a good one or the feature is very wanted, I might implement it #151

michaelfeil avatar Mar 17 '24 01:03 michaelfeil

I think launching this from command-line is basically not possible. It would be launching multiple models with typer.cli where most args are auto-generated.

Do you think a cli-setup from .yaml would be a good option? @bacoco

michaelfeil avatar Mar 29 '24 06:03 michaelfeil

Hey all, its fully implemented. @bacoco

michaelfeil avatar May 20 '24 00:05 michaelfeil