llama-stack docker image name and GPU issue

I am using the latest version I just pip installed 0.0.45. In my environment (Fedora 39) I have export DOCKER_BINARY="podman" When I build, pretty much following the example, I get this image: localhost/distribution-my-local-stack:latest. I looked to see if this image was used in a container with "podman container ps -a" and the answer is no containers at all.

When I use "llama stack run", I eventually see this command on the screen: podman run -it -p 5000:5000 -v /home/sgrubb/.llama/builds/docker/my-local-stack-run.yaml:/app/config.yaml llamastack-my-local-stack python -m llama_stack.distribution.server.server --yaml_config /app/config.yaml --port 5000 And then it offers to download llamastack-my-local-stack:latest from various registries.

Why did it not use the local image? Could it be it has distribution instead of llamastack?

Also, I chose vllm as the inference service. It downloaded and installed nvidia drivers during build. Once I corrected the command to have distribution, it dies with "RuntimeError: Failed to infer device type." That's when I notice that gpus have not been passed. If nvidia drivers are loaded, it needs to have --device nvidia.com/gpu=all added to the docker command. It also wouldn't hurt to add --cgroup-conf=memory.high=32g or something configurable.

Also, the command should probably have --rm in it to erase the ephemeral container created when the command was invoked.

In summary, naming issue between the image created and the image used. And GPU's not being enabled.

Oct 24 '24 20:10 stevegrubb

Are you trying to build a custom docker image yourself and run the docker image? For docker images flow, it is recommended to (1) build the docker image (2) run the docker image with docker run .... See some examples here: https://github.com/meta-llama/llama-stack/tree/main/distributions

Why did it not use the local image? Could it be it has distribution instead of llamastack?

Could you check the output of images you have locally and see if the names match?

podman images

Also, I chose vllm as the inference service. It downloaded and installed nvidia drivers during build. Once I corrected the command to have distribution, it dies with "RuntimeError: Failed to infer device type."

Are you using vLLM inline implementation, or remote vLLM inference? What is the full build.yaml and run.yaml file you are using?

Oct 25 '24 01:10 yanxi0830

Are you trying to build a custom docker image yourself and run the docker image?

I am not building a custom docker image. I am following the existing examples. I pip installed the llama-stack package and followed the example. When "llama stack build" finished, it suggested to me to run "llama stack run my-local-stack". I did that. It did not suggest for me to use "docker run".

Could you check the output of images you have locally and see if the names match?

I provided the information. Podman says I have localhost/distribution-my-local-stack:latest. The "llama stack run" script is using llamastack-my-local-stack which does not match.

I chose vllm from the menu in "llama stack build". It downloaded and installed vllm to the image. This is not remote because the introductory instructions do not say how to even do that. I'm following the instructions at https://github.com/meta-llama/llama-stack/blob/main/docs/cli_reference.md and keeping it simple to just get this running.

Oct 25 '24 02:10 stevegrubb

Podman says I have localhost/distribution-my-local-stack:latest. The "llama stack run" script is using llamastack-my-local-stack which does not match.

This is fixed in https://github.com/meta-llama/llama-stack/commit/cb43caa2c3cb3ff9d23eca281b6fda2c14e73ec1 . Note that we are currently migrating away from using llama stack run to start docker images. You may simply run the following command to start your built docker image.

podman run -it -p 5000:5000 -v /home/sgrubb/.llama/builds/docker/my-local-stack-run.yaml:/app/config.yaml --gpus all distribution-my-local-stack python -m llama_stack.distribution.server.server --yaml_config /app/config.yaml --port 5000

it needs to have --device nvidia.com/gpu=all added to the docker command

If you are running vLLM as an inline inference provider (cc @russellb), you will need to add the --gpus all flag to the docker command.

I'm following the instructions at https://github.com/meta-llama/llama-stack/blob/main/docs/cli_reference.md and keeping it simple to just get this running.

If you want to quickly get something started running. You can refer to some of our pre-built distributions: https://github.com/meta-llama/llama-stack/tree/main/distributions

Please see the developer cookbook on what instructions to follow. Let me know if there's anything unclear!

Oct 25 '24 04:10 yanxi0830

OK, the issue I reported is fixed in recent updates.

Oct 29 '24 20:10 stevegrubb