llama-stack How to specify the model type using the pre-build docker?

System Info

Using Windows 11

Information

[X] The official example scripts
[X] My own modified scripts

🐛 Describe the bug

When running:

 docker run -it -p 5000:5000 -v  C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6

I'm able to get the container going for a 3.1 model

astack/llamastack-local-gpu --yaml_config C:\Users\sivar\PycharmProjects\llama_stack_learner\run.yaml 
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace       
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
    with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:UserssivarPycharmProjectsllama_stack_learnerrun.yaml'

sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run -it -p 5000:5000 -v  C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6
Resolved 12 providers
 inner-inference => meta-reference
 models => __routing_table__
 inference => __autorouted__
 inner-safety => meta-reference
 inner-memory => meta-reference
 shields => __routing_table__
 safety => __autorouted__
 memory_banks => __routing_table__
 memory => __autorouted__
 agents => meta-reference
 telemetry => meta-reference
 inspect => __builtin__

Loading model `Llama3.1-8B-Instruct`
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/usr/local/lib/python3.10/site-packages/torch/__init__.py:955: UserWarning: torch.set_default_tensor_t
ype() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:432.)
  _C._set_default_tensor_type(t)
...

but I want it to load 3.2 Vision (attempts to call that model fail) eg:

$ curl -X POST http://localhost:5000/inference/chat_completion -H "Content-Type: application/json" -d 
'{"model": " Llama3.2-11B-Vision-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the moon."}], "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}}'
{"detail":"Invalid value: ` Llama3.2-11B-Vision-Instruct` not registered"}

I've tried pointing the docker container towards my local YAML file with the right model:

version: '2'
built_at: '2024-10-08T17:40:45.325529'
image_name: local
docker_image: null
conda_env: local
apis:
  - shields
  - agents
  - models
  - memory
  - memory_banks
  - inference
  - safety
providers:
  inference:
    - provider_id: meta0
      provider_type: meta-reference
      config:
        model: Llama3.2-11B-Vision-Instruct  # Updated model name
        quantization: null
        torch_seed: null
        max_seq_len: 4096
        max_batch_size: 1
  safety:
    - provider_id: meta0
      provider_type: meta-reference
      config:
        llama_guard_shield:
          model: Llama-Guard-3-1B
          excluded_categories: []
          disable_input_check: false
          disable_output_check: false
        prompt_guard_shield:
          model: Prompt-Guard-86M
  memory:
    - provider_id: meta0
      provider_type: meta-reference
      config: {}
  agents:
    - provider_id: meta0
      provider_type: meta-reference
      config:
        persistence_store:
          namespace: null
          type: sqlite
          db_path: ~/.llama/runtime/kvstore.db
  telemetry:
    - provider_id: meta0
      provider_type: meta-reference
      config: {}

but when i try with this:

$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml

I get this:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace       
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
    with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml'

is there a better way to specify the right model_id?

(ps i do have the model downloaded)

Error logs

see abpve

Expected behavior

see above

Oct 27 '24 01:10 Travis-Barton

Change your docker run command here with --yaml_config /root/my-run.yaml, as it will be reading the file inside docker container. E.g.

docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml

See guide here: https://github.com/meta-llama/llama-stack/tree/main/distributions/meta-reference-gpu

Oct 28 '24 03:10 yanxi0830

I get this error:


sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
    with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Program Files/Git/root/my-run.yaml'

I installed llama-stack with pip, so maybe its missing some local file? I just have my .yaml file sitting in my dummy repo.

Oct 29 '24 04:10 Travis-Barton

@yanxi0830 any ideas on this? I'm not sure how to properly path my docker flows

Nov 07 '24 01:11 Travis-Barton

I just have my .yaml file sitting in my dummy repo.

what is the exact path of your .yaml file?

Nov 07 '24 02:11 ashwinb

what is the exact path of your .yaml file?

C:\Users\sivar\PycharmProjects\llama_stack_learner\run.yaml

Nov 07 '24 05:11 Travis-Barton

I'm guessing it may be the flag -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml that didn't mount the file correctly for windows. Could you try -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:C:/Program Files/Git/root/my-run.yaml?

Nov 07 '24 22:11 yanxi0830

docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:C:/Program Files/Git/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml

returns

docker: invalid reference format: repository name (Git/root/my-run.yaml) must be lowercase.
See 'docker run --help'.

If i try to convert it to unix paths with:

docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v "/c/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml" --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml

I get

lamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
    with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Program Files/Git/root/my-run.yaml'

So i wonder if theres a bigger problem i'm not seeing beyond pathing

Nov 11 '24 02:11 Travis-Barton

@Travis-Barton Could you try this to see if the file have been correctly mounted in docker container?

docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v "/c/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml" --entrypoint /bin/sh llamastack/llamastack-local-gpu

# ls
# ls /root/my-run.yaml

Nov 11 '24 18:11 yanxi0830

That first command gives:

$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v "/c/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml" --entrypoint /bin/sh llamastack/llamastack-local-gpu
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "C:/Program Files/Git/usr/bin/sh": stat C:/Program Files/Git/usr/bin/sh: no such file or directory: unknown.

Nov 11 '24 19:11 Travis-Barton

@yanxi0830 is there another way I can debug this? I'm admittedly new to docker and ChatGPT isn't proving very helpful here XD

Nov 13 '24 18:11 Travis-Barton

Hey @Travis-Barton, we have significantly upgraded our docker images and documentations recently here: https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/self_hosted_distro/index.html

Could you check which docker you would like to use and follow the guide there to see if it solves your issue?

Nov 20 '24 18:11 yanxi0830

@yanxi0830 i'll give it a shot tonight!

Nov 20 '24 18:11 Travis-Barton

: https://llama-stack.readthedocs.io/en/latest/getting_started

This gives me a 404

Dec 11 '24 06:12 Travis-Barton

@yanxi0830 i tried following both conda and Docker for https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html

neither seem to work.

 docker run \
  -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  llamastack/distribution-meta-reference-gpu \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-11B-Vision-Instruct 
Unable to find image 'llamastack/distribution-meta-reference-gpu:latest' locally
latest: Pulling from llamastack/distribution-meta-reference-gpu
bf5ee5c528dc: Download complete
5862c84e1a84: Download complete
1013e0f07472: Download complete
011be492f64b: Download complete
9652a5db17d7: Download complete
22370d9525db: Download complete
2d429b9e73a6: Download complete
d447e55d51db: Download complete
9a2c149417f8: Download complete
Digest: sha256:bbf4acf96acaab6bdfd2b4fb03ebafb8cd4abc8c598fa333ddefd352997050f9
Status: Downloaded newer image for llamastack/distribution-meta-reference-gpu:latest
Setting CLI environment variable INFERENCE_MODEL => meta-llama/Llama-3.2-11B-Vision-Instruct
Using template meta-reference-gpu config file: /usr/local/lib/python3.10/site-packages/llama_stack/templates/meta-reference-gpu/run.yaml
Run configuration:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: meta-reference-gpu
datasets: []
docker_image: null
eval_tasks: []
image_name: meta-reference-gpu
memory_banks: []
metadata_store:
  db_path: /root/.llama/distributions/meta-reference-gpu/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-11B-Vision-Instruct
  provider_id: meta-reference-inference
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /root/.llama/distributions/meta-reference-gpu/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      checkpoint_dir: 'null'
      max_seq_len: 4096
      model: meta-llama/Llama-3.2-11B-Vision-Instruct
    provider_id: meta-reference-inference
    provider_type: inline::meta-reference
  memory:
  - config:
      kvstore:
        db_path: /root/.llama/distributions/meta-reference-gpu/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  telemetry:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
scoring_fns: []
shields: []
version: '2'

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 411, in <module>
    main()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in main
    impls = asyncio.run(construct_stack(config))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 185, in construct_stack
    impls = await resolve_impls(
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 221, in resolve_impls
    impl = await instantiate_provider(
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 308, in instantiate_provider
    impl = await fn(*args)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/__init__.py", line 19, in get_provider_impl
    await impl.initialize()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/inference.py", line 56, in initialize
    self.generator = LlamaModelParallelGenerator(self.config)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/model_parallel.py", line 58, in __init__
    checkpoint_dir = model_checkpoint_dir(self.model)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/generation.py", line 63, in model_checkpoint_dir
    assert checkpoint_dir.exists(), (
AssertionError: Could not find checkpoints in: /root/.llama/checkpoints/Llama3.2-11B-Vision-Instruct. Please download model using `llama download --model-id Llama3.2-11B-Vision-Instruct`

sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run   -it   -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT   llamastack/distribution-meta-reference-gpu   --port $LLAMA_STACK_PORT   --env INFERENCE_MODEL=meta-llama/Llama-3.2-11B-Vision-Instruct
Setting CLI environment variable INFERENCE_MODEL => meta-llama/Llama-3.2-11B-Vision-Instruct
Using template meta-reference-gpu config file: /usr/local/lib/python3.10/site-packages/llama_stack/templates/meta-reference-gpu/run.yaml
Run configuration:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: meta-reference-gpu
datasets: []
docker_image: null
eval_tasks: []
image_name: meta-reference-gpu
memory_banks: []
metadata_store:
  db_path: /root/.llama/distributions/meta-reference-gpu/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-11B-Vision-Instruct
  provider_id: meta-reference-inference
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /root/.llama/distributions/meta-reference-gpu/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      checkpoint_dir: 'null'
      max_seq_len: 4096
      model: meta-llama/Llama-3.2-11B-Vision-Instruct
    provider_id: meta-reference-inference
    provider_type: inline::meta-reference
  memory:
  - config:
      kvstore:
        db_path: /root/.llama/distributions/meta-reference-gpu/faiss_store.db
        namespace: null
        type: sqlite
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 411, in <module>
    main()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in main
    impls = asyncio.run(construct_stack(config))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 185, in construct_stack
    impls = await resolve_impls(
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 221, in resolve_impls
    impl = await instantiate_provider(
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 308, in instantiate_provider
    impl = await fn(*args)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/__init__.py", line 19, in get_provider_impl
    await impl.initialize()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/inference.py", line 56, in initialize
    self.generator = LlamaModelParallelGenerator(self.config)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/model_parallel.py", line 58, in __init__
    checkpoint_dir = model_checkpoint_dir(self.model)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/generation.py", line 63, in model_checkpoint_dir
    assert checkpoint_dir.exists(), (
AssertionError: Could not find checkpoints in: /root/.llama/checkpoints/Llama3.2-11B-Vision-Instruct. Please download model using `llama download --model-id Llama3.2-11B-Vision-Instruct`

$ llama stack build --template meta-reference-gpu --image-type conda
llama stack run distributions/meta-reference-gpu/run.yaml \
  --port 5001 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates | --no-list-templates] [--name NAME] [--image-type {conda,docker}]
llama stack build: error: You must specify a name for the build using --name when using a template
usage: llama [-h] {download,model,stack} ...
llama: error: unrecognized arguments: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct

I'm not sure what I'm doing wrong, but I confess I'm frustrated on how difficult this is to get running. I really want to use Meta's distro and not use Ollama. What can I do to fix this?

Dec 11 '24 07:12 Travis-Barton

Also for the above, i do have llama 3.2 installed:

$ ls ~/.llama/checkpoints
Llama3.1-8B-Instruct  Llama3.2-11B-Vision-Instruct  Llama3.2-3B-Instruct

Dec 11 '24 07:12 Travis-Barton

This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.

Mar 14 '25 00:03 github-actions[bot]

This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant!

Apr 13 '25 00:04 github-actions[bot]

llama-stack llama-stack copied to clipboard

How to specify the model type using the pre-build docker?

System Info

Information

🐛 Describe the bug

Error logs

Expected behavior

llama-stack
llama-stack copied to clipboard