Eric Curtin

Results 479 comments of Eric Curtin

But only CPU 😢 Linux aarch64 CPU inferencing works too

Off the top of my head no, try and reach out to skopeo, buildah or podman artifact people

Could you try and build within a podman-machine Linux VM? It appears as though this is an attempt to build for Linux on macOS

We will be enabling the functionality to push and pull models to OCI registries in RamaLama pending completion of the new "podman artifact" command: https://github.com/containers/ramalama

@qcu266 @NPhMKgbDNy1M @chlins @alex-vg we have this in docker model runner now and we've put effort into cleaning up the github repo to make it more contributor friendly, please star,...

We have just llama.cpp server and vllm server integrated but are open to llama-cpp-python server also.

This is how it should look when we've multiple models: ``` {"object":"list","data":[{"id":"smollm:360m","object":"model","created":1737073177,"owned_by":"library"},{"id":"smollm:135m","object":"model","created":1736344249,"owned_by":"library"}]} ```

@sallyom I think we should consider this: https://github.com/BerriAI/litellm it's compatible with vllm and llama.cpp, it sits in front of either. (whereas llama-cpp-python only gives us a llama.cpp solution) vllm closed...

https://docs.litellm.ai/docs/providers/vllm

After reading a bit, I'm gonna propose we write our own proxy, litellm won't provide what we need. You can proxy/route/bridge (whichever terminology you prefer) requests to pre-spun up llama-servers...