Eric Curtin comments

Results 479 comments of


                                            Eric Curtin

Consolidate with instructlab container images

2 issues with this at present: 1. Pulls from non public locations: nvcr.io/nvidia/cuda:12.4.1-devel-ubi9 2. Can find where these images are published in general.

Consolidate with instructlab container images

> > 1. Pulls from non public locations: nvcr.io/nvidia/cuda:12.4.1-devel-ubi9 > > this is not necessarily an issue, in case a token based auth can be given to the oci registry?...

Consolidate with instructlab container images

We also have to think of ways of auto-detecting the primary GPU (that's kinda separate to this issue), I have an idea of how to do that for AMD GPU,...

Consolidate with instructlab container images

@tarilabs I'm also unsure if the instructlab team plan on maintaining/publishing those in future so maybe we should create our own...

Consolidate with instructlab container images

Here's another image I was pointed towards that will be useful: https://github.com/rh-aiservices-bu/llm-on-openshift/blob/main/llm-servers/vllm/gpu/Containerfile this will be a useful reference for our image with vllm runtime. It's UBI9 based which is exactly...

Create container images latest-cuda, latest-vulkan, latest-rocm

Nvidia maintains ubi9 images, so that will help us quite a bit: https://hub.docker.com/r/nvidia/cuda

Switch to https://github.com/abetlen/llama-cpp-python

> I would prefer to go with the python route. I agree, the main problem we have right now, is this "--instruct" option in llama.cpp direct was very useful for...

Switch to https://github.com/abetlen/llama-cpp-python

Tagging @abetlen , we also sent an email with more details to [email protected]

Switch to https://github.com/abetlen/llama-cpp-python

Yeah... To be honest at this point, if we do add this, it will probably be just another --runtime, like --runtime llama-cpp-python

Switch to https://github.com/abetlen/llama-cpp-python

llama-cppy-python does appear as though it implements a more feature complete OpenAI Compatible Server to the direct llama.cpp one, but I don't know for sure: https://llama-cpp-python.readthedocs.io/en/latest/server/ it also implements multi-model...