llama-stack
llama-stack copied to clipboard
Llama3.2-1B only reply "<|end_of_text|>"
Hi Expert, I just tried to to install llama-stack and run the test with Llama3.2-1B but I found the response is really weird. Since my GPU RAM is only 6GB, I can't try bigger model to see if its the problem of "Llama3.2-1B". Just want to make sure I didn't miss anything in the "get start" document. Could you kindly help point out anything I might get wrong to lead this result? Thank you very much!
My install:
git clone [email protected]:meta-llama/llama-stack.git
conda create -n stack python=3.10
conda activate stack
llama stack build
> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local
> Enter the image type you want your distribution to be built with (docker or conda): conda
Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter the API provider for the inference API: (default=meta-reference): meta-reference
> Enter the API provider for the safety API: (default=meta-reference): meta-reference
> Enter the API provider for the agents API: (default=meta-reference): meta-reference
> Enter the API provider for the memory API: (default=meta-reference): meta-reference
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference
llama stack configure my_local
Could not find my_local. Trying conda build name instead...
Configuration already exists at `/home/ivan/.llama/builds/conda/my_local-run.yaml`. Will overwrite...
Configuring API `inference`...
=== Configuring provider `meta-reference` for API inference...
Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-1B
Do you want to configure quantization? (y/n): n
Enter value for torch_seed (optional):
Enter value for max_seq_len (default: 4096) (required):
Enter value for max_batch_size (default: 1) (required):
Configuring API `safety`...
=== Configuring provider `meta-reference` for API safety...
Do you want to configure llama_guard_shield? (y/n): n
Enter value for enable_prompt_guard (default: False) (optional):
Configuring API `agents`...
=== Configuring provider `meta-reference` for API agents...
Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite):
Configuring SqliteKVStoreConfig:
Enter value for namespace (optional):
Enter value for db_path (existing: /home/ivan/.llama/runtime/kvstore.db) (required):
Configuring API `memory`...
=== Configuring provider `meta-reference` for API memory...
> Please enter the supported memory bank type your provider has for memory: vector
Configuring API `telemetry`...
=== Configuring provider `meta-reference` for API telemetry...
llama stack run my_local --disable-ipv6
Test
python -m llama_stack.apis.inference.client localhost 5000 --model=Llama3.2-1B
User>hello world, write me a 2 sentence poem about the moon
Assistant> <|end_of_text|>
My OS and GPU
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02 Driver Version: 550.107.02 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1660 Ti Off | 00000000:01:00.0 On | N/A |
| N/A 80C P0 28W / 80W | 3407MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1847 G /usr/bin/gnome-shell 1MiB |
| 0 N/A N/A 9258 C ...envs/llamastack-my_local/bin/python 3352MiB |
+-----------------------------------------------------------------------------------------+