alpaca-lora
alpaca-lora copied to clipboard
create test.py
Goal:
- Saving our time to test the environment.
- Demonstrating how to fine-tune small dataset.
- Observing the changing of the text generation for each evaluation, work through this example you will understand why is your output repeating and unreasonable.
Features: There 10 instances in the file that can be fine-tuned in 1:30s with RTX3090 and 4mins in Colab T4 16G , the outcome is 100% correct and I didn't find any overfitting for Llama-7B yet.
Command: python test.py (That's all, the requirement packages, dataset, generate function, and other components like utils and template are all in this file, we can paste or upload this single file to any cloud server then run.)
I ran it and got results, some examples of which are below. What could be the issue? I already installed the PEFT and Transformer versions that were mentioned.
Sample 2: Instruction: what is your name? Input: Output:My name is Alpaca lora, I am a LLM chatbot. How may I help you?
Predict: <s> what is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is => The correct answer should follow the aplaca template.
Sample 3: Instruction: test Input: Output:test completed
Predict: <s> test_description='Test if a string matches a given pattern' test_name='Test if a string matches a given pattern' test_below='Test if a string matches a given pattern'
# offer description ### describe above test ### ### describe below test ### ### finish test ### ### finish below test ### => The correct answer should follow the aplaca template.
I ran it and got results, some examples of which are below. What could be the issue? I already installed the PEFT and Transformer versions that were mentioned.
Sample 2: Instruction: what is your name? Input: Output:My name is Alpaca lora, I am a LLM chatbot. How may I help you?
Predict:
what is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is your name? What is => The correct answer should follow the aplaca template.Sample 3: Instruction: test Input: Output:test completed
Predict:
test_description='Test if a string matches a given pattern' test_name='Test if a string matches a given pattern' test_below='Test if a string matches a given pattern'offer description ### describe above test ### ### describe below test ### ### finish test ### ### finish below test ### => The correct answer should follow the aplaca template.
The results shown inefficient training, the model neither remember the answer nor template and eos_token_id. How's the other questions, can the model predict at least one correct answer out of the ten?
These answers seems like you only ran 10 steps, did you train 70 steps? And the last prints are not the final model we generated, you need to test it in Gradio. I just ran the code again, here is the link:https://colab.research.google.com/drive/1_mhkjwfo8kafK5v97-chz4ySzZLNHfUH?usp=sharing
You're right, thank you!
Note: to be able to use this test script (or alpaca-lora training really) in free version of Google Colab, I had to split the yahma/llama-7b-hf into more shards because of RAM limitations. I uploaded the split version as jploski/llama-7b-hf. Still, test.py aborts after training with the following error:
Traceback (most recent call last):
File "/content/test.py", line 730, in <module>
fire.Fire(run)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/test.py", line 728, in run
main()
File "/content/test.py", line 576, in main
model = PeftModel.from_pretrained(
File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 189, in from_pretrained
model = dispatch_model(
File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 342, in dispatch_model
raise ValueError(
ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.model.layers.13, base_model.model.model.layers.14, base_model.model.model.layers.15, base_model.model.model.layers.16, base_model.model.model.layers.17, base_model.model.model.layers.18, base_model.model.model.layers.19, base_model.model.model.layers.20, base_model.model.model.layers.21, base_model.model.model.layers.22, base_model.model.model.layers.23, base_model.model.model.layers.24, base_model.model.model.layers.25, base_model.model.model.layers.26, base_model.model.model.layers.27, base_model.model.model.layers.28, base_model.model.model.layers.29, base_model.model.model.layers.30, base_model.model.model.layers.31, base_model.model.model.norm, base_model.model.lm_head.
If I feed the final LoRA weights to generate.py, the outputs in Gradio are almost as expected ("What is your model?" produces "My pre-trained model is Llama-1B, finetuned via LORA." and "Are you overfitting?" produces "Of course nah", but others match exactly.
!python generate.py --base_model=jploski/llama-7b-hf --lora_weights=/content/test/ --load_8bit --share_gradio=True
I should add that this is a very worthwhile PR and something like this should be referred to from front page documentation. I wasted a LOT of time running into various problems because of limited hardware resources in Colab (which I at first did not realize they were too limited!) and decapoda-research/llama-7b-hf being no longer compatible. I think for beginners it's great to have something which tells you that your setup is correct.
The problem is that as a beginner you don't even really know which outputs to expect and with which parameters to run, and you would like to start with a small example which does not require a big dataset, renting hardware and hours of training (maybe to get nothing in the end). It's easy to believe that it is yourself doing something wrong or fundamentally misunderstanding something rather than just bad configuration. So from educational viewpoint this self-contained minimal example script is indeed very interesting.