lorax
lorax copied to clipboard
ValueError: Adapter '/data/llama2-lora' is not compatible with model '/data/Llama-2-7b-chat-hf'. Use --model-id '/new-model/llama2-7b/Llama-2-7b-chat-hf' instead.
System Info
2024-01-10T09:14:20.356771Z INFO lorax_launcher: Args { model_id: "/data/Llama-2-7b-chat-hf", adapter_id: "/data/llama2-lora", source: "hub", adapter_source: "hub", revision: None, validation_workers: 2, sharded: None, num_shard: None, quantize: None, compile: false, dtype: None, trust_remote_code: false, max_concurrent_requests: 128, max_best_of: 2, max_stop_sequences: 4, max_input_length: 1023, max_total_tokens: 1024, waiting_served_ratio: 1.2, max_batch_prefill_tokens: 1024, max_batch_total_tokens: Some(1024), max_waiting_tokens: 20, max_active_adapters: 128, adapter_cycle_time_s: 2, hostname: "e2bcf2fc09e3", port: 80, shard_uds_path: "/tmp/lorax-server", master_addr: "localhost", master_port: 29500, huggingface_hub_cache: Some("/data"), weights_cache_override: None, disable_custom_kernels: false, cuda_memory_fraction: 1.0, json_output: false, otlp_endpoint: None, cors_allow_origin: [], watermark_gamma: None, watermark_delta: None, ngrok: false, ngrok_authtoken: None, ngrok_edge: None, env: false, download_only: false } 2024-01-10T09:14:20.356869Z INFO download: lorax_launcher: Starting download process. 2024-01-10T09:14:23.227147Z WARN lorax_launcher: cli.py:145 No safetensors weights found for model /data/Llama-2-7b-chat-hf at revision None. Converting PyTorch weights to safetensors.
2024-01-10T09:14:25.972567Z INFO lorax_launcher: convert.py:114 Convert: [1/2] -- Took: 0:00:02.741882
2024-01-10T09:14:33.450451Z INFO lorax_launcher: convert.py:114 Convert: [2/2] -- Took: 0:00:07.477435
2024-01-10T09:14:33.450778Z INFO lorax_launcher: cli.py:104 Files are already present on the host. Skipping download.
2024-01-10T09:14:33.972217Z INFO download: lorax_launcher: Successfully downloaded weights. 2024-01-10T09:14:33.972518Z INFO shard-manager: lorax_launcher: Starting shard rank=0 2024-01-10T09:14:37.373745Z INFO lorax_launcher: flash_llama.py:74 Merging adapter weights from adapter_id /data/llama2-lora into model weights.
2024-01-10T09:14:37.375075Z ERROR lorax_launcher: server.py:235 Error when initializing model
Information
- [X] Docker
- [ ] The CLI directly
Tasks
- [X] An officially supported command
- [ ] My own modifications
Reproduction
volume=/home/user/data docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -it ghcr.io/predibase/lorax:latest --model-id /data/Llama-2-7b-chat-hf --adapter-id /data/llama2-lora --max-input-length 1023 --max-total-tokens 1024 --max-batch-total-tokens 1024 --max-batch-prefill-tokens 1024
Expected behavior
I train this lora on my local llama2 model, why it is not compatible
I also try another way:
volume=/home/user/data docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data -it ghcr.io/predibase/lorax:latest --model-id /data/Llama-2-7b-chat-hf --max-input-length 1023 --max-total-tokens 1024 --max-batch-total-tokens 1024 --max-batch-prefill-tokens 1024
from lorax import Client
client = Client("http://127.0.0.1:8080")
check.py
prompt = "[INST] Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May? [/INST]" print(client.generate(prompt, max_new_tokens=64).generated_text)
adapter_id = "/data/llama2-lora" adapter_source = "local" print(client.generate(prompt, max_new_tokens=64, adapter_id=adapter_id, adapter_source=adapter_source).generated_text)
python check.py To find out how many clips Natalia sold altogether in April and May, we need to use the information given in the problem.
In April, Natalia sold clips to 48 of her friends. So, she sold a total of 48 clips in April.
In
Traceback (most recent call last):
File "check.py", line 12, in
may be looks like this : https://github.com/predibase/lorax/issues/51
may be looks like this : #51
@abhibst I have already tried this solution, but it still error. python check.py To find out how many clips Natalia sold altogether in April and May, we need to use the information given in the problem.
In April, Natalia sold clips to 48 of her friends. So, she sold a total of 48 clips in April.
In Traceback (most recent call last): File "check.py", line 12, in print(client.generate(prompt, max_new_tokens=64, adapter_id=adapter_id, adapter_source=adapter_source).generated_text) File "/home/azureuser/anaconda3/lib/python3.8/site-packages/lorax/client.py", line 157, in generate raise parse_error(resp.status_code, payload) lorax.errors.GenerationError: Request failed during generation: Server error: Incorrect path_or_model_id: '/new-model/llama2-7b/Llama-2-7b-chat-hf'. Please provide either the path to a local folder or the repo_id of a model on the Hub.
Hey @Senna1960321, sorry for the late reply!
For the first error you saw:
ValueError: Adapter '/data/llama2-lora' is not compatible with model '/data/Llama-2-7b-chat-hf'. Use --model-id '/new-model/llama2-7b/Llama-2-7b-chat-hf' instead.
This suggests you're running an older version of LoRAX. The error message was changed to a warning in #58. Can you try running docker pull ghcr.io/predibase/lorax:latest to get the latest image?
If you're still running into issues after that, then for the more recent errors, can you share the output of the following commands run from outside the container?
ls /home/user/data/Llama-2-7b-chat-hf
ls /home/user/data/llama2-lora
The error message is odd because it seems to suggest that it's looking for a model with path /new-model/llama2-7b/Llama-2-7b-chat-hf.
@tgaddair Thanks for your reply, I solved this problem by make volume=new-model/llama2-7b and then docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/new-model/llama2-7b -it ghcr.io/predibase/lorax:latest --model-id /new-model/llama2-7b/Llama-2-7b-chat-hf --adapter-id /new-model/llama2-7b/llama2-lora I fine tune this lora by Llama-2-7b-chat-hf with path /new-model/llama2-7b/Llama-2-7b-chat-hf, I don't know why loraX only recognize this path. I have another question, when I use loraX I find it inference answer is worse than the normal way, even though the dataset I had already fine tuned. The decrease in the quality of the generated responses is not as pronounced.