Ashutosh Baheti comments

Results 20 comments of


                                            Ashutosh Baheti

llama7B issue

Not an exact replication of the FSDP version, but, I recently reimplemented DPO with QLoRA on LLaMA-7B model The model is already available on huggingface: https://huggingface.co/abaheti95/dpo_qlora_hh Here is the respective...

llama7B issue

Even I'm not sure. I would also love to know if there is a huge difference. Let me know if you notice anything.

llama7B issue

I attempted using their trainer first but was noticing very slow training. I asked TRL about this and they had certain pointers on how to speed it up https://github.com/huggingface/trl/issues/729. By...

llama7B issue

In my DPO training attempt, I also saw that margin was increasing but both chosen and rejected reward decreased.

Is fine tuning with e.g., LORA supported?

I recently reimplemented DPO with QLoRA on LLaMA-7B model. The model is already available on huggingface: https://huggingface.co/abaheti95/dpo_qlora_hh Here is the respective code for implementation: https://github.com/abaheti95/LoL-RL/blob/main/dpo_qlora_llama_hh.py I hope this helps.

Setting up the environment

Hello. Sorry I didn't notice this issue earlier. Can you explain what step you're getting the error?

Setting up the environment

Hmm. I guess I incrementally added packages to the environment and that resulted in some dependency issues. The two problematic packages don't need to be installed I think. Spacy will...

Setting up the environment

You can share any dependency errors that you come across that you couldn't resolve. I can try to look into them.

Setting up the environment

Hello @skywalker023 , were you able to resolve this problem? If yes, I'd really appreciate it if you can share what changes you had to make.

Setting up the environment

Thank you for sharing this. Good luck with your deadlines!