llama-stack
llama-stack copied to clipboard
Llama3.1-8B-Instruct,already there, but llama stack can not find it。conda and docker both doesn't work~~~~~
Failed to run stack through conda:llama stack run stack-3.2-1B --port 5000 --disable-ipv6
https://github.com/meta-llama/llama-stack/issues/194 ,I don't know why stack needs to link it to the address [: ffff: 0.0.2.208]
Failed to run stack through docker,stack do not support .pth ? I download safetensors 1B module from huggingface it also doesn't work.
root@720:~/.llama/checkpoints/Llama3.1-8B-Instruct# ls -alh
total 15G
drwxr-xr-x 2 root root 4.0K Oct 5 04:30 .
drwxr-xr-x 11 root root 4.0K Oct 5 04:36 ..
-rw-r--r-- 1 root root 15G Jul 20 05:55 consolidated.00.pth
-rw-r--r-- 1 root root 199 Jul 20 05:55 params.json
-rw-r--r-- 1 root root 8.7M Sep 29 15:38 tokenizer.json
-rw-r--r-- 1 root root 489K Oct 5 04:30 tokenizer.model
root@720:~/.llama/checkpoints/Llama3.1-8B-Instruct# docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
Resolved 8 providers in topological order
Api.models: routing_table
Api.inference: router
Api.shields: routing_table
Api.safety: router
Api.memory_banks: routing_table
Api.memory: router
Api.agents: meta-reference
Api.telemetry: meta-reference
Traceback (most recent call last):
File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 351, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 288, in main
impls, specs = asyncio.run(resolve_impls_with_routing(config))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 104, in resolve_impls_with_routing
impl = await instantiate_provider(spec, deps, configs[api])
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 174, in instantiate_provider
impl = await instantiate_provider(
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 192, in instantiate_provider
impl = await fn(*args)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/__init__.py", line 18, in get_provider_impl
await impl.initialize()
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/inference.py", line 38, in initialize
self.generator = LlamaModelParallelGenerator(self.config)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/model_parallel.py", line 70, in __init__
checkpoint_dir = model_checkpoint_dir(self.model)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/generation.py", line 54, in model_checkpoint_dirassert checkpoint_dir.exists(), (
AssertionError: Could not find checkpoints in: /root/.llama/checkpoints/Llama3.1-8B-Instruct. Please download model using `llama download --model-id Llama3.1-8B-Instruct`
For conda error torch.distributed.DistNetworkError, do you have the output of nvidia-smi?
For docker error, could you try going inside the docker file to see if the checkpoint directory have been successfully mounted?
$ docker run -it -p 5009:5009 -v ~/.llama:/root/.llama --gpus=all --entrypoint /bin/sh llamastack/llamastack-local-gpu
Inside the docker container, you should be able to see smt like the following.
# ls
llamastack-build.yaml llamastack-run.yaml
# ls /root/.llama
builds checkpoints client distributions runtime
# cd /root/.llama
# cd checkpoints
# ls
Llama-Guard-3-1B Llama3.1-8B-Instruct Llama3.2-1B-Instruct Prompt-Guard-86M
Llama-Guard-3-8B Llama3.2-11B-Vision-Instruct Llama3.2-3B-Instruct
Llama3.1-8B Llama3.2-1B Llama3.2-90B-Vision-Instruct
# cd Llama3.1-8B-Instruct
# ls -l
total 15686344
-rw-r--r-- 1 root root 150 Sep 8 02:59 checklist.chk
-rw-r--r-- 1 root root 16060617592 Sep 8 03:00 consolidated.00.pth
-rw-r--r-- 1 root root 199 Sep 8 02:59 params.json
-rw-r--r-- 1 root root 2183982 Sep 8 02:59 tokenizer.model
@yanxi0830 Docker: The Docker can now find the weight files, but a new issue has arisen. https://github.com/meta-llama/llama-stack/issues/242
conda:
(llamastack-stack-3.2-1B) root@720:~/.llama/checkpoints# llama stack run stack-3.2-1B --port 5000 --disable-ipv6
Resolved 8 providers in topological order
Api.models: routing_table
Api.inference: router
Api.shields: routing_table
Api.safety: router
Api.memory_banks: routing_table
Api.memory: router
Api.agents: meta-reference
Api.telemetry: meta-reference
[W1012 03:20:14.408988300 socket.cpp:697] [c10d] The client socket has failed to connect to [::ffff:0.0.2.208]:47231 (errno: 110 - Connection timed out).
Ctrl-C detected. Aborting...
(llamastack-stack-3.2-1B) root@720:~/.llama/checkpoints# nvidia-smi
Sat Oct 12 03:22:06 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla P40 Off | 00000000:05:00.0 Off | Off |
| N/A 34C P0 49W / 250W | 0MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
| 1 Tesla P40 Off | 00000000:42:00.0 Off | Off |
| N/A 38C P0 45W / 250W | 0MiB / 24576MiB | 1% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
(llamastack-stack-3.2-1B) root@720:~/.llama/checkpoints#
Moving discussion to https://github.com/meta-llama/llama-stack/issues/242
What do we do if it is NOT mounted?
PS C:\Users\sivar> docker run -it -p 5009:5009 -v ~/.llama:/root/.llama --gpus=all --entrypoint /bin/sh llamastack/llamastack-local-gpu
# ls
llamastack-build.yaml llamastack-run.yaml
# ls /root/.llama
#