lorax ValueError: Adapter '/data/llama2-lora' is not compatible with model '/data/Llama-2-7b-chat-hf'. Use --model-id '/new-model/llama2-7b/Llama-2-7b-chat-hf' instead.

trafficstars

System Info

2024-01-10T09:14:20.356771Z INFO lorax_launcher: Args { model_id: "/data/Llama-2-7b-chat-hf", adapter_id: "/data/llama2-lora", source: "hub", adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, compile: false, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1023, max_total_tokens: 1024, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 1024, max_batch_total_tokens: Some(1024), max_waiting_tokens: 20, max_active_adapters: 128, adapter_cycle_time_s: 2, hostname: "e2bcf2fc09e3", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false } 2024-01-10T09:14:20.356869Z INFO download: lorax_launcher: Starting download process. 2024-01-10T09:14:23.227147Z WARN lorax_launcher: cli.py:145 No safetensors weights found for model /data/Llama-2-7b-chat-hf at revision None. Converting PyTorch weights to safetensors.

2024-01-10T09:14:25.972567Z INFO lorax_launcher: convert.py:114 Convert: [1/2] -- Took: 0:00:02.741882

2024-01-10T09:14:33.450451Z INFO lorax_launcher: convert.py:114 Convert: [2/2] -- Took: 0:00:07.477435

2024-01-10T09:14:33.450778Z INFO lorax_launcher: cli.py:104 Files are already present on the host. Skipping download.

2024-01-10T09:14:33.972217Z INFO download: lorax_launcher: Successfully downloaded weights. 2024-01-10T09:14:33.972518Z INFO shard-manager: lorax_launcher: Starting shard rank=0 2024-01-10T09:14:37.373745Z INFO lorax_launcher: flash_llama.py:74 Merging adapter weights from adapter_id /data/llama2-lora into model weights.

2024-01-10T09:14:37.375075Z ERROR lorax_launcher: server.py:235 Error when initializing model

Information

[X] Docker
[ ] The CLI directly

Tasks

[X] An officially supported command
[ ] My own modifications

Reproduction

volume=/home/user/data docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -it ghcr.io/predibase/lorax:latest --model-id /data/Llama-2-7b-chat-hf --adapter-id /data/llama2-lora --max-input-length 1023 --max-total-tokens 1024 --max-batch-total-tokens 1024 --max-batch-prefill-tokens 1024

Expected behavior

I train this lora on my local llama2 model, why it is not compatible

Jan 10 '24 09:01 Senna1960321

I also try another way:

volume=/home/user/data docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -it ghcr.io/predibase/lorax:latest --model-id /data/Llama-2-7b-chat-hf --max-input-length 1023 --max-total-tokens 1024 --max-batch-total-tokens 1024 --max-batch-prefill-tokens 1024

from lorax import Client

client = Client("http://127.0.0.1:8080")

check.py

prompt = "[INST] Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]" print(client.generate(prompt, max_new_tokens=64).generated_text)

adapter_id = "/data/llama2-lora" adapter_source = "local" print(client.generate(prompt, max_new_tokens=64, adapter_id=adapter_id, adapter_source=adapter_source).generated_text)

python check.py To find out how many clips Natalia sold altogether in April and May, we need to use the information given in the problem.

In April, Natalia sold clips to 48 of her friends. So, she sold a total of 48 clips in April.

In Traceback (most recent call last): File "check.py", line 12, in print(client.generate(prompt, max_new_tokens=64, adapter_id=adapter_id, adapter_source=adapter_source).generated_text) File "/home/azureuser/anaconda3/lib/python3.8/site-packages/lorax/client.py", line 157, in generate raise parse_error(resp.status_code, payload) lorax.errors.GenerationError: Request failed during generation: Server error: Incorrect path_or_model_id: '/new-model/llama2-7b/Llama-2-7b-chat-hf'. Please provide either the path to a local folder or the repo_id of a model on the Hub. @tgaddair

Jan 10 '24 10:01 Senna1960321

may be looks like this : https://github.com/predibase/lorax/issues/51

Jan 10 '24 16:01 abhibst

may be looks like this : #51

@abhibst I have already tried this solution, but it still error. python check.py To find out how many clips Natalia sold altogether in April and May, we need to use the information given in the problem.

In April, Natalia sold clips to 48 of her friends. So, she sold a total of 48 clips in April.

In Traceback (most recent call last): File "check.py", line 12, in print(client.generate(prompt, max_new_tokens=64, adapter_id=adapter_id, adapter_source=adapter_source).generated_text) File "/home/azureuser/anaconda3/lib/python3.8/site-packages/lorax/client.py", line 157, in generate raise parse_error(resp.status_code, payload) lorax.errors.GenerationError: Request failed during generation: Server error: Incorrect path_or_model_id: '/new-model/llama2-7b/Llama-2-7b-chat-hf'. Please provide either the path to a local folder or the repo_id of a model on the Hub.

Jan 11 '24 02:01 Senna1960321

Hey @Senna1960321, sorry for the late reply!

For the first error you saw:

ValueError: Adapter '/data/llama2-lora' is not compatible with model '/data/Llama-2-7b-chat-hf'. Use --model-id '/new-model/llama2-7b/Llama-2-7b-chat-hf' instead.

This suggests you're running an older version of LoRAX. The error message was changed to a warning in #58. Can you try running docker pull ghcr.io/predibase/lorax:latest to get the latest image?

If you're still running into issues after that, then for the more recent errors, can you share the output of the following commands run from outside the container?

ls /home/user/data/Llama-2-7b-chat-hf

ls /home/user/data/llama2-lora

The error message is odd because it seems to suggest that it's looking for a model with path /new-model/llama2-7b/Llama-2-7b-chat-hf.

Jan 12 '24 05:01 tgaddair

@tgaddair Thanks for your reply, I solved this problem by make volume=new-model/llama2-7b and then docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/new-model/llama2-7b -it ghcr.io/predibase/lorax:latest --model-id /new-model/llama2-7b/Llama-2-7b-chat-hf --adapter-id /new-model/llama2-7b/llama2-lora I fine tune this lora by Llama-2-7b-chat-hf with path /new-model/llama2-7b/Llama-2-7b-chat-hf, I don't know why loraX only recognize this path. I have another question, when I use loraX I find it inference answer is worse than the normal way, even though the dataset I had already fine tuned. The decrease in the quality of the generated responses is not as pronounced.

Jan 15 '24 10:01 Senna1960321

lorax lorax copied to clipboard

ValueError: Adapter '/data/llama2-lora' is not compatible with model '/data/Llama-2-7b-chat-hf'. Use --model-id '/new-model/llama2-7b/Llama-2-7b-chat-hf' instead.

System Info

Information

Tasks

Reproduction

Expected behavior

lorax
lorax copied to clipboard