llama-stack
llama-stack copied to clipboard
docker image name and GPU issue
I am using the latest version I just pip installed 0.0.45. In my environment (Fedora 39) I have export DOCKER_BINARY="podman" When I build, pretty much following the example, I get this image: localhost/distribution-my-local-stack:latest. I looked to see if this image was used in a container with "podman container ps -a" and the answer is no containers at all.
When I use "llama stack run", I eventually see this command on the screen: podman run -it -p 5000:5000 -v /home/sgrubb/.llama/builds/docker/my-local-stack-run.yaml:/app/config.yaml llamastack-my-local-stack python -m llama_stack.distribution.server.server --yaml_config /app/config.yaml --port 5000 And then it offers to download llamastack-my-local-stack:latest from various registries.
Why did it not use the local image? Could it be it has distribution instead of llamastack?
Also, I chose vllm as the inference service. It downloaded and installed nvidia drivers during build. Once I corrected the command to have distribution, it dies with "RuntimeError: Failed to infer device type." That's when I notice that gpus have not been passed. If nvidia drivers are loaded, it needs to have --device nvidia.com/gpu=all added to the docker command. It also wouldn't hurt to add --cgroup-conf=memory.high=32g or something configurable.
Also, the command should probably have --rm in it to erase the ephemeral container created when the command was invoked.
In summary, naming issue between the image created and the image used. And GPU's not being enabled.