optimum-habana icon indicating copy to clipboard operation
optimum-habana copied to clipboard

run_lora_clm.py support for other datasets

Open tmabraham opened this issue 7 months ago • 2 comments

Feature request

right now the script is hardcoded for either "tatsu-lab/alpaca" or "timdettmers/openassistant-guanaco" and using any other dataset throws an error.

Motivation

It would be nice to be able to finetune on our own datasets.

Your contribution

Happy to test any code out for this...

tmabraham avatar Jan 09 '24 06:01 tmabraham

Yes, I agree that would be much better. Can you share a command line that fails? And the error message you get please?

regisss avatar Jan 09 '24 08:01 regisss

I posted PR #955 that fixes this issue and will allow other datasets to be used with run_lora_clm.py. I tried it with several different datasets to check the functionality, but please let me know if there's a specific one that you'd like me to try.

dmsuehir avatar May 06 '24 17:05 dmsuehir