Will be great if we could load more models in the same container and switch between them using model name

Oct 23 '23 14:10 bacoco

Great suggestion. I already thought about this, but it makes the server harder to monitor / measure throughput etc, especially the /ready and /metrics routes.

For now, I would suggest starting up two servers at the same time using /bin/sh -c .. logic.

Docker

For Docker you can do e.g.

# Dockerfile for multiple models via multiple ports
FROM michaelf34/infinity:latest
ENTRYPOINT ["/bin/sh", "-c", \
 "(/opt/poetry/bin/poetry run infinity_emb --port 8080 --model-name-or-path sentence-transformers/all-MiniLM-L6-v2 &);\
 (/opt/poetry/bin/poetry run infinity_emb --port 8081 --model-name-or-path intfloat/e5-large-v2 )"]

You can run it via

docker build -t custominfinity . && docker run -it -p 8080:8080 -p 8081:8081 custominfinity

Dont forget to add GPUs / cache dir etc.

Oct 23 '23 16:10 michaelfeil

FYI, there is now a high level python API. You can build now your own FastAPI server, according to your needs.

Dec 18 '23 11:12 michaelfeil

@bacoco Taking suggestions for design patterns for this feature. If there is a good one or the feature is very wanted, I might implement it #151

Mar 17 '24 01:03 michaelfeil

I think launching this from command-line is basically not possible. It would be launching multiple models with typer.cli where most args are auto-generated.

Do you think a cli-setup from .yaml would be a good option? @bacoco

Mar 29 '24 06:03 michaelfeil

Hey all, its fully implemented. @bacoco

May 20 '24 00:05 michaelfeil

Multimodel inference

Docker