llama-stack
llama-stack copied to clipboard
How to specify the model type using the pre-build docker?
System Info
Using Windows 11
Information
- [X] The official example scripts
- [X] My own modified scripts
🐛 Describe the bug
When running:
docker run -it -p 5000:5000 -v C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6
I'm able to get the container going for a 3.1 model
astack/llamastack-local-gpu --yaml_config C:\Users\sivar\PycharmProjects\llama_stack_learner\run.yaml
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:UserssivarPycharmProjectsllama_stack_learnerrun.yaml'
sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run -it -p 5000:5000 -v C:/Users/sivar/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu --disable-ipv6
Resolved 12 providers
inner-inference => meta-reference
models => __routing_table__
inference => __autorouted__
inner-safety => meta-reference
inner-memory => meta-reference
shields => __routing_table__
safety => __autorouted__
memory_banks => __routing_table__
memory => __autorouted__
agents => meta-reference
telemetry => meta-reference
inspect => __builtin__
Loading model `Llama3.1-8B-Instruct`
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
/usr/local/lib/python3.10/site-packages/torch/__init__.py:955: UserWarning: torch.set_default_tensor_t
ype() is deprecated as of PyTorch 2.1, please use torch.set_default_dtype() and torch.set_default_device() as alternatives. (Triggered internally at ../torch/csrc/tensor/python_tensor.cpp:432.)
_C._set_default_tensor_type(t)
...
but I want it to load 3.2 Vision (attempts to call that model fail) eg:
$ curl -X POST http://localhost:5000/inference/chat_completion -H "Content-Type: application/json" -d
'{"model": " Llama3.2-11B-Vision-Instruct", "messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write me a 2 sentence poem about the moon."}], "sampling_params": {"temperature": 0.7, "seed": 42, "max_tokens": 512}}'
{"detail":"Invalid value: ` Llama3.2-11B-Vision-Instruct` not registered"}
I've tried pointing the docker container towards my local YAML file with the right model:
version: '2'
built_at: '2024-10-08T17:40:45.325529'
image_name: local
docker_image: null
conda_env: local
apis:
- shields
- agents
- models
- memory
- memory_banks
- inference
- safety
providers:
inference:
- provider_id: meta0
provider_type: meta-reference
config:
model: Llama3.2-11B-Vision-Instruct # Updated model name
quantization: null
torch_seed: null
max_seq_len: 4096
max_batch_size: 1
safety:
- provider_id: meta0
provider_type: meta-reference
config:
llama_guard_shield:
model: Llama-Guard-3-1B
excluded_categories: []
disable_input_check: false
disable_output_check: false
prompt_guard_shield:
model: Prompt-Guard-86M
memory:
- provider_id: meta0
provider_type: meta-reference
config: {}
agents:
- provider_id: meta0
provider_type: meta-reference
config:
persistence_store:
namespace: null
type: sqlite
db_path: ~/.llama/runtime/kvstore.db
telemetry:
- provider_id: meta0
provider_type: meta-reference
config: {}
but when i try with this:
$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml
I get this:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml'
is there a better way to specify the right model_id?
(ps i do have the model downloaded)
Error logs
see abpve
Expected behavior
see above
Change your docker run command here with --yaml_config /root/my-run.yaml, as it will be reading the file inside docker container. E.g.
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
See guide here: https://github.com/meta-llama/llama-stack/tree/main/distributions/meta-reference-gpu
I get this error:
sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Program Files/Git/root/my-run.yaml'
I installed llama-stack with pip, so maybe its missing some local file? I just have my .yaml file sitting in my dummy repo.
@yanxi0830 any ideas on this? I'm not sure how to properly path my docker flows
I just have my .yaml file sitting in my dummy repo.
what is the exact path of your .yaml file?
what is the exact path of your .yaml file?
C:\Users\sivar\PycharmProjects\llama_stack_learner\run.yaml
I'm guessing it may be the flag -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml that didn't mount the file correctly for windows. Could you try -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:C:/Program Files/Git/root/my-run.yaml?
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v C:/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:C:/Program Files/Git/root/my-run.yaml --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
returns
docker: invalid reference format: repository name (Git/root/my-run.yaml) must be lowercase.
See 'docker run --help'.
If i try to convert it to unix paths with:
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v "/c/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml" --gpus=all llamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
I get
lamastack/llamastack-local-gpu --yaml_config /root/my-run.yaml
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 274, in main
with open(yaml_config, "r") as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'C:/Program Files/Git/root/my-run.yaml'
So i wonder if theres a bigger problem i'm not seeing beyond pathing
@Travis-Barton Could you try this to see if the file have been correctly mounted in docker container?
docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v "/c/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml" --entrypoint /bin/sh llamastack/llamastack-local-gpu
# ls
# ls /root/my-run.yaml
That first command gives:
$ docker run -it -p 5000:5000 -v ~/.llama:/root/.llama -v "/c/Users/sivar/PycharmProjects/llama_stack_learner/run.yaml:/root/my-run.yaml" --entrypoint /bin/sh llamastack/llamastack-local-gpu
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "C:/Program Files/Git/usr/bin/sh": stat C:/Program Files/Git/usr/bin/sh: no such file or directory: unknown.
@yanxi0830 is there another way I can debug this? I'm admittedly new to docker and ChatGPT isn't proving very helpful here XD
Hey @Travis-Barton, we have significantly upgraded our docker images and documentations recently here: https://llama-stack.readthedocs.io/en/latest/getting_started/distributions/self_hosted_distro/index.html
Could you check which docker you would like to use and follow the guide there to see if it solves your issue?
@yanxi0830 i'll give it a shot tonight!
@yanxi0830 i tried following both conda and Docker for https://llama-stack.readthedocs.io/en/latest/distributions/self_hosted_distro/meta-reference-gpu.html
neither seem to work.
docker run \
-it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-11B-Vision-Instruct
Unable to find image 'llamastack/distribution-meta-reference-gpu:latest' locally
latest: Pulling from llamastack/distribution-meta-reference-gpu
bf5ee5c528dc: Download complete
5862c84e1a84: Download complete
1013e0f07472: Download complete
011be492f64b: Download complete
9652a5db17d7: Download complete
22370d9525db: Download complete
2d429b9e73a6: Download complete
d447e55d51db: Download complete
9a2c149417f8: Download complete
Digest: sha256:bbf4acf96acaab6bdfd2b4fb03ebafb8cd4abc8c598fa333ddefd352997050f9
Status: Downloaded newer image for llamastack/distribution-meta-reference-gpu:latest
Setting CLI environment variable INFERENCE_MODEL => meta-llama/Llama-3.2-11B-Vision-Instruct
Using template meta-reference-gpu config file: /usr/local/lib/python3.10/site-packages/llama_stack/templates/meta-reference-gpu/run.yaml
Run configuration:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: meta-reference-gpu
datasets: []
docker_image: null
eval_tasks: []
image_name: meta-reference-gpu
memory_banks: []
metadata_store:
db_path: /root/.llama/distributions/meta-reference-gpu/registry.db
namespace: null
type: sqlite
models:
- metadata: {}
model_id: meta-llama/Llama-3.2-11B-Vision-Instruct
provider_id: meta-reference-inference
provider_model_id: null
providers:
agents:
- config:
persistence_store:
db_path: /root/.llama/distributions/meta-reference-gpu/agents_store.db
namespace: null
type: sqlite
provider_id: meta-reference
provider_type: inline::meta-reference
inference:
- config:
checkpoint_dir: 'null'
max_seq_len: 4096
model: meta-llama/Llama-3.2-11B-Vision-Instruct
provider_id: meta-reference-inference
provider_type: inline::meta-reference
memory:
- config:
kvstore:
db_path: /root/.llama/distributions/meta-reference-gpu/faiss_store.db
namespace: null
type: sqlite
provider_id: faiss
provider_type: inline::faiss
safety:
- config: {}
provider_id: llama-guard
provider_type: inline::llama-guard
telemetry:
- config: {}
provider_id: meta-reference
provider_type: inline::meta-reference
scoring_fns: []
shields: []
version: '2'
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 411, in <module>
main()
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in main
impls = asyncio.run(construct_stack(config))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 185, in construct_stack
impls = await resolve_impls(
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 221, in resolve_impls
impl = await instantiate_provider(
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 308, in instantiate_provider
impl = await fn(*args)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/__init__.py", line 19, in get_provider_impl
await impl.initialize()
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/inference.py", line 56, in initialize
self.generator = LlamaModelParallelGenerator(self.config)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/model_parallel.py", line 58, in __init__
checkpoint_dir = model_checkpoint_dir(self.model)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/generation.py", line 63, in model_checkpoint_dir
assert checkpoint_dir.exists(), (
AssertionError: Could not find checkpoints in: /root/.llama/checkpoints/Llama3.2-11B-Vision-Instruct. Please download model using `llama download --model-id Llama3.2-11B-Vision-Instruct`
sivar@Odysseus MINGW64 ~/PycharmProjects/llama_stack_learner
$ docker run -it -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT llamastack/distribution-meta-reference-gpu --port $LLAMA_STACK_PORT --env INFERENCE_MODEL=meta-llama/Llama-3.2-11B-Vision-Instruct
Setting CLI environment variable INFERENCE_MODEL => meta-llama/Llama-3.2-11B-Vision-Instruct
Using template meta-reference-gpu config file: /usr/local/lib/python3.10/site-packages/llama_stack/templates/meta-reference-gpu/run.yaml
Run configuration:
apis:
- agents
- inference
- memory
- safety
- telemetry
conda_env: meta-reference-gpu
datasets: []
docker_image: null
eval_tasks: []
image_name: meta-reference-gpu
memory_banks: []
metadata_store:
db_path: /root/.llama/distributions/meta-reference-gpu/registry.db
namespace: null
type: sqlite
models:
- metadata: {}
model_id: meta-llama/Llama-3.2-11B-Vision-Instruct
provider_id: meta-reference-inference
provider_model_id: null
providers:
agents:
- config:
persistence_store:
db_path: /root/.llama/distributions/meta-reference-gpu/agents_store.db
namespace: null
type: sqlite
provider_id: meta-reference
provider_type: inline::meta-reference
inference:
- config:
checkpoint_dir: 'null'
max_seq_len: 4096
model: meta-llama/Llama-3.2-11B-Vision-Instruct
provider_id: meta-reference-inference
provider_type: inline::meta-reference
memory:
- config:
kvstore:
db_path: /root/.llama/distributions/meta-reference-gpu/faiss_store.db
namespace: null
type: sqlite
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 411, in <module>
main()
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 343, in main
impls = asyncio.run(construct_stack(config))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/stack.py", line 185, in construct_stack
impls = await resolve_impls(
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 221, in resolve_impls
impl = await instantiate_provider(
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 308, in instantiate_provider
impl = await fn(*args)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/__init__.py", line 19, in get_provider_impl
await impl.initialize()
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/inference.py", line 56, in initialize
self.generator = LlamaModelParallelGenerator(self.config)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/model_parallel.py", line 58, in __init__
checkpoint_dir = model_checkpoint_dir(self.model)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/inline/inference/meta_reference/generation.py", line 63, in model_checkpoint_dir
assert checkpoint_dir.exists(), (
AssertionError: Could not find checkpoints in: /root/.llama/checkpoints/Llama3.2-11B-Vision-Instruct. Please download model using `llama download --model-id Llama3.2-11B-Vision-Instruct`
$ llama stack build --template meta-reference-gpu --image-type conda
llama stack run distributions/meta-reference-gpu/run.yaml \
--port 5001 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
usage: llama stack build [-h] [--config CONFIG] [--template TEMPLATE] [--list-templates | --no-list-templates] [--name NAME] [--image-type {conda,docker}]
llama stack build: error: You must specify a name for the build using --name when using a template
usage: llama [-h] {download,model,stack} ...
llama: error: unrecognized arguments: --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
I'm not sure what I'm doing wrong, but I confess I'm frustrated on how difficult this is to get running. I really want to use Meta's distro and not use Ollama. What can I do to fix this?
Also for the above, i do have llama 3.2 installed:
$ ls ~/.llama/checkpoints
Llama3.1-8B-Instruct Llama3.2-11B-Vision-Instruct Llama3.2-3B-Instruct
This issue has been automatically marked as stale because it has not had activity within 60 days. It will be automatically closed if no further activity occurs within 30 days.
This issue has been automatically closed due to inactivity. Please feel free to reopen if you feel it is still relevant!