Eric Curtin
Eric Curtin
Apparently docker doesn't allow it, going with RAMALAMA as container is sort of redundant anyway.
So it can fit in CI and it's easier to download We removed support for gfx9. Which are roughly AMD GPUs prior to 2019. This saves us a ton of...
We should be close to docker support. But let's get it tested regularly via GitHub CI.
zsh is the default shell on macOS. autocomplete is not working, reported by @rhatdan
If there is a way to auto-detect between language model files and asr model files. We should do that, or if that's not possible we should just use a runtime...
Use UBI9 if possible latest-cuda will go something like this: dnf config-manager --add-repo https://developer.download.nvidia.com/compute/cuda/repos/rhel9/x86_64/cuda-rhel9.repo dnf install -y libcudnn8 nvidia-driver-NVML nvidia-driver-cuda-libs Build with GGML_CUDA=1 latest-rocm will go something like this: https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/native-install/rhel.html...
Discussed here: https://github.com/containers/ramalama/issues/239 https://github.com/abetlen/llama-cpp-python/blob/7c4aead82d349469bbbe7d8c0f4678825873c039/docs/server.md#configuration-and-multi-model-support
Right now we call llama.cpp directly, long-term we should go with either llama.cpp directly or llama-cpp-python. Because maintaining two different llama.cpp backends isn't ideal, they will never be in sync...
We should consolidate our efforts with instructlab and share container base images: https://github.com/instructlab/instructlab/tree/main/containers
quay.io can only automatically build x86_64 aarch64 is important, this is tested to work on Apple Silicon/macOS with podman machine and libkrun. we just need to figure out a way...