Eric Curtin
Eric Curtin
2 issues with this at present: 1. Pulls from non public locations: nvcr.io/nvidia/cuda:12.4.1-devel-ubi9 2. Can find where these images are published in general.
> > 1. Pulls from non public locations: nvcr.io/nvidia/cuda:12.4.1-devel-ubi9 > > this is not necessarily an issue, in case a token based auth can be given to the oci registry?...
We also have to think of ways of auto-detecting the primary GPU (that's kinda separate to this issue), I have an idea of how to do that for AMD GPU,...
@tarilabs I'm also unsure if the instructlab team plan on maintaining/publishing those in future so maybe we should create our own...
Here's another image I was pointed towards that will be useful: https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/llm-servers/vllm/gpu/Containerfile this will be a useful reference for our image with vllm runtime. It's UBI9 based which is exactly...
Nvidia maintains ubi9 images, so that will help us quite a bit: https://hub.docker.com/r/nvidia/cuda
> I would prefer to go with the python route. I agree, the main problem we have right now, is this "--instruct" option in llama.cpp direct was very useful for...
Tagging @abetlen , we also sent an email with more details to [email protected]
Yeah... To be honest at this point, if we do add this, it will probably be just another --runtime, like --runtime llama-cpp-python
llama-cppy-python does appear as though it implements a more feature complete OpenAI Compatible Server to the direct llama.cpp one, but I don't know for sure: https://llama-cpp-python.readthedocs.io/en/latest/server/ it also implements multi-model...