llama-stack Llama3.2-1B only reply "<|end_of

Llama3.2-1B only reply "<|end_of_text|>"

Open tw40210 opened this issue 1 year ago • 3 comments

Hi Expert, I just tried to to install llama-stack and run the test with Llama3.2-1B but I found the response is really weird. Since my GPU RAM is only 6GB, I can't try bigger model to see if its the problem of "Llama3.2-1B". Just want to make sure I didn't miss anything in the "get start" document. Could you kindly help point out anything I might get wrong to lead this result? Thank you very much!

My install:

git clone [email protected]:meta-llama/llama-stack.git

conda create -n stack python=3.10
conda activate stack

llama stack build
> Enter an unique name for identifying your Llama Stack build distribution (e.g. my-local-stack): my-local
> Enter the image type you want your distribution to be built with (docker or conda): conda

 Llama Stack is composed of several APIs working together. Let's configure the providers (implementations) you want to use for these APIs.
> Enter the API provider for the inference API: (default=meta-reference): meta-reference
> Enter the API provider for the safety API: (default=meta-reference): meta-reference
> Enter the API provider for the agents API: (default=meta-reference): meta-reference
> Enter the API provider for the memory API: (default=meta-reference): meta-reference
> Enter the API provider for the telemetry API: (default=meta-reference): meta-reference

llama stack configure my_local
Could not find my_local. Trying conda build name instead...
Configuration already exists at `/home/ivan/.llama/builds/conda/my_local-run.yaml`. Will overwrite...
Configuring API `inference`...
=== Configuring provider `meta-reference` for API inference...
Enter value for model (default: Llama3.1-8B-Instruct) (required): Llama3.2-1B            
Do you want to configure quantization? (y/n): n
Enter value for torch_seed (optional): 
Enter value for max_seq_len (default: 4096) (required): 
Enter value for max_batch_size (default: 1) (required): 

Configuring API `safety`...
=== Configuring provider `meta-reference` for API safety...
Do you want to configure llama_guard_shield? (y/n): n
Enter value for enable_prompt_guard (default: False) (optional): 

Configuring API `agents`...
=== Configuring provider `meta-reference` for API agents...
Enter `type` for persistence_store (options: redis, sqlite, postgres) (default: sqlite): 

Configuring SqliteKVStoreConfig:
Enter value for namespace (optional): 
Enter value for db_path (existing: /home/ivan/.llama/runtime/kvstore.db) (required): 

Configuring API `memory`...
=== Configuring provider `meta-reference` for API memory...
> Please enter the supported memory bank type your provider has for memory: vector

Configuring API `telemetry`...
=== Configuring provider `meta-reference` for API telemetry...

llama stack run my_local --disable-ipv6

Test

python -m llama_stack.apis.inference.client localhost 5000  --model=Llama3.2-1B

User>hello world, write me a 2 sentence poem about the moon
Assistant> <|end_of_text|>

My OS and GPU

PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.107.02             Driver Version: 550.107.02     CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1660 Ti     Off |   00000000:01:00.0  On |                  N/A |
| N/A   80C    P0             28W /   80W |    3407MiB /   6144MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A      1847      G   /usr/bin/gnome-shell                            1MiB |
|    0   N/A  N/A      9258      C   ...envs/llamastack-my_local/bin/python       3352MiB |
+-----------------------------------------------------------------------------------------+

Oct 12 '24 14:10 tw40210

llama-stack llama-stack copied to clipboard

Llama3.2-1B only reply "<|end_of_text|>"

My install:

Test

My OS and GPU

llama-stack
llama-stack copied to clipboard