llama-stack Llama3.1-8B-Instruct,already there, but llama stack can not find it。conda and docker both doesn't work~~~~~

Failed to run stack through conda：llama stack run stack-3.2-1B --port 5000 --disable-ipv6
https://github.com/meta-llama/llama-stack/issues/194 ，I don't know why stack needs to link it to the address [: ffff: 0.0.2.208]

Failed to run stack through docker,stack do not support .pth ? I download safetensors 1B module from huggingface it also doesn't work.

root@720:~/.llama/checkpoints/Llama3.1-8B-Instruct# ls -alh
total 15G
drwxr-xr-x  2 root root 4.0K Oct  5 04:30 .
drwxr-xr-x 11 root root 4.0K Oct  5 04:36 ..
-rw-r--r--  1 root root  15G Jul 20 05:55 consolidated.00.pth
-rw-r--r--  1 root root  199 Jul 20 05:55 params.json
-rw-r--r--  1 root root 8.7M Sep 29 15:38 tokenizer.json
-rw-r--r--  1 root root 489K Oct  5 04:30 tokenizer.model

root@720:~/.llama/checkpoints/Llama3.1-8B-Instruct# docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu
Resolved 8 providers in topological order
  Api.models: routing_table
  Api.inference: router
  Api.shields: routing_table
  Api.safety: router
  Api.memory_banks: routing_table
  Api.memory: router
  Api.agents: meta-reference
  Api.telemetry: meta-reference

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/local/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 351, in <module>
    fire.Fire(main)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 135, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 468, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/usr/local/lib/python3.10/site-packages/fire/core.py", line 684, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 288, in main
    impls, specs = asyncio.run(resolve_impls_with_routing(config))
  File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 104, in resolve_impls_with_routing
    impl = await instantiate_provider(spec, deps, configs[api])
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 174, in instantiate_provider
    impl = await instantiate_provider(
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/resolver.py", line 192, in instantiate_provider
    impl = await fn(*args)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/__init__.py", line 18, in get_provider_impl
    await impl.initialize()
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/inference.py", line 38, in initialize
    self.generator = LlamaModelParallelGenerator(self.config)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/model_parallel.py", line 70, in __init__
    checkpoint_dir = model_checkpoint_dir(self.model)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/impls/meta_reference/inference/generation.py", line 54, in model_checkpoint_dirassert checkpoint_dir.exists(), (
AssertionError: Could not find checkpoints in: /root/.llama/checkpoints/Llama3.1-8B-Instruct. Please download model using `llama download --model-id Llama3.1-8B-Instruct`

Oct 11 '24 12:10 Itime-ren

For conda error torch.distributed.DistNetworkError, do you have the output of nvidia-smi?

For docker error, could you try going inside the docker file to see if the checkpoint directory have been successfully mounted?

$ docker run -it -p 5009:5009 -v ~/.llama:/root/.llama --gpus=all --entrypoint /bin/sh llamastack/llamastack-local-gpu

Inside the docker container, you should be able to see smt like the following.

# ls
llamastack-build.yaml  llamastack-run.yaml
# ls /root/.llama
builds  checkpoints  client  distributions  runtime
# cd /root/.llama
# cd checkpoints
# ls
Llama-Guard-3-1B  Llama3.1-8B-Instruct          Llama3.2-1B-Instruct          Prompt-Guard-86M
Llama-Guard-3-8B  Llama3.2-11B-Vision-Instruct  Llama3.2-3B-Instruct
Llama3.1-8B       Llama3.2-1B                   Llama3.2-90B-Vision-Instruct
# cd Llama3.1-8B-Instruct
# ls -l
total 15686344
-rw-r--r-- 1 root root         150 Sep  8 02:59 checklist.chk
-rw-r--r-- 1 root root 16060617592 Sep  8 03:00 consolidated.00.pth
-rw-r--r-- 1 root root         199 Sep  8 02:59 params.json
-rw-r--r-- 1 root root     2183982 Sep  8 02:59 tokenizer.model

Oct 11 '24 23:10 yanxi0830

@yanxi0830 Docker： The Docker can now find the weight files, but a new issue has arisen. https://github.com/meta-llama/llama-stack/issues/242

conda：

(llamastack-stack-3.2-1B) root@720:~/.llama/checkpoints# llama stack run stack-3.2-1B --port 5000 --disable-ipv6
Resolved 8 providers in topological order
  Api.models: routing_table
  Api.inference: router
  Api.shields: routing_table
  Api.safety: router
  Api.memory_banks: routing_table
  Api.memory: router
  Api.agents: meta-reference
  Api.telemetry: meta-reference

[W1012 03:20:14.408988300 socket.cpp:697] [c10d] The client socket has failed to connect to [::ffff:0.0.2.208]:47231 (errno: 110 - Connection timed out).

Ctrl-C detected. Aborting...

(llamastack-stack-3.2-1B) root@720:~/.llama/checkpoints# nvidia-smi 
Sat Oct 12 03:22:06 2024       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07              Driver Version: 550.90.07      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla P40                      Off |   00000000:05:00.0 Off |                  Off |
| N/A   34C    P0             49W /  250W |       0MiB /  24576MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
|   1  Tesla P40                      Off |   00000000:42:00.0 Off |                  Off |
| N/A   38C    P0             45W /  250W |       0MiB /  24576MiB |      1%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+
(llamastack-stack-3.2-1B) root@720:~/.llama/checkpoints#

Oct 12 '24 03:10 Itime-ren

Moving discussion to https://github.com/meta-llama/llama-stack/issues/242

Oct 15 '24 22:10 yanxi0830

What do we do if it is NOT mounted?


PS C:\Users\sivar> docker run -it -p 5009:5009 -v ~/.llama:/root/.llama --gpus=all --entrypoint /bin/sh llamastack/llamastack-local-gpu
# ls
llamastack-build.yaml  llamastack-run.yaml
# ls /root/.llama
#

Oct 23 '24 04:10 Travis-Barton

llama-stack llama-stack copied to clipboard

Llama3.1-8B-Instruct,already there, but llama stack can not find it。conda and docker both doesn't work~~~~~

llama-stack
llama-stack copied to clipboard