Ashutosh Baheti

Results 20 comments of Ashutosh Baheti

Not an exact replication of the FSDP version, but, I recently reimplemented DPO with QLoRA on LLaMA-7B model The model is already available on huggingface: https://huggingface.co/abaheti95/dpo_qlora_hh Here is the respective...

Even I'm not sure. I would also love to know if there is a huge difference. Let me know if you notice anything.

I attempted using their trainer first but was noticing very slow training. I asked TRL about this and they had certain pointers on how to speed it up https://github.com/huggingface/trl/issues/729. By...

In my DPO training attempt, I also saw that margin was increasing but both chosen and rejected reward decreased.

I recently reimplemented DPO with QLoRA on LLaMA-7B model. The model is already available on huggingface: https://huggingface.co/abaheti95/dpo_qlora_hh Here is the respective code for implementation: https://github.com/abaheti95/LoL-RL/blob/main/dpo_qlora_llama_hh.py I hope this helps.

Hello. Sorry I didn't notice this issue earlier. Can you explain what step you're getting the error?

Hmm. I guess I incrementally added packages to the environment and that resulted in some dependency issues. The two problematic packages don't need to be installed I think. Spacy will...

You can share any dependency errors that you come across that you couldn't resolve. I can try to look into them.

Hello @skywalker023 , were you able to resolve this problem? If yes, I'd really appreciate it if you can share what changes you had to make.

Thank you for sharing this. Good luck with your deadlines!