Ensheng Shi (石恩升)

Results 13 comments of Ensheng Shi (石恩升)

I think you are right. When I run ./data_utils.py. he error is: FileNotFoundError: [Errno 2] No such file or directory: '/data/enshi_data/Chinese-Poetry-Generation/data/sxhy_dict.txt' When I make a file named data in the...

+1. The same error when running in Ubuntu ```RuntimeError: probability tensor contains either `inf`, `nan` or element < 0```

fixed by ``` pip uninstall bitsandbytes pip install bitsandbytes ``` refer to https://github.com/TimDettmers/bitsandbytes/issues/134. Another issue ```RuntimeError: probability tensor contains either `inf`, `nan` or element < 0``` ![image](https://user-images.githubusercontent.com/34572118/224756950-3bef51d3-db60-48d4-8e20-76e258db636e.png)

@tloen Hi Eric J. Do you have any considerations or motivations to add the above code snippets?

> It's interesting, my alpaca run produced a 36mb file, and had really good results. Then, when I merged it and tried to finetune my own custom dataset, the model...

Update the training loss. I fine-tuned llama-7b with lora on alpaca_data.json three time. The training losses are shown as follows. ![image](https://user-images.githubusercontent.com/34572118/231461452-05c28950-0d26-468e-8273-8d03e2f433d4.png) ![image](https://user-images.githubusercontent.com/34572118/231461802-03064272-9853-4a97-b2a1-d26d1230e32a.png) ![image](https://user-images.githubusercontent.com/34572118/231461840-43ac303d-ed3e-42bb-ac7b-21845c1b7934.png)

> Trying to remove the script causes CUDA crashes on my end during training Do you mean comment/remove the following code? ``` old_state_dict = model.state_dict model.state_dict = ( lambda self,...

Thanks! I also saw some issues (https://github.com/tloen/alpaca-lora/issues/288, https://github.com/tloen/alpaca-lora/issues/170) discussing GPU type would affect the training process. The root cause may be the peft update instead of GPU type. I think...

Thanks for your sharing! I think one quick solution is to install peft and transformer of the specified version. For example, ```` pip install git+https://github.com/huggingface/peft.git@xxxxxxx pip install git+https://github.com/huggingface/transformers.git@xxxx ```` We...

I commented the following code and fine-tuned llama-7b with lora on alpaca_data.json in 8V100. The training loss still became 0 at the iteration 560. Is it a problem with multi-card...