litgpt icon indicating copy to clipboard operation
litgpt copied to clipboard

prompt_style

Open fireyanci opened this issue 1 year ago • 1 comments

I don't want to use the dataset styles listed in prompt styles: Dict. I want to use my own defined dataset style. How can I build my own dataset style to use finetune/ora,my datasets style is { "conversation": [ { "system": "This is like an instruction", "input": "", "output": "" }] }

fireyanci avatar May 13 '24 16:05 fireyanci

because i want use Multi round conversation data

fireyanci avatar May 14 '24 08:05 fireyanci

I think the easiest way here would be to use on of the existing datasets as templates. I remember that deita had multi turn questions in the dataset, so I added this as an option. Maybe this is helpful as a template for building your own datset:

https://github.com/Lightning-AI/litgpt/blob/cbbe9cda9f0ad535c471460236cfdb0b4f50e2bd/litgpt/data/deita.py#L29

But note that LitGTP otherwise doesn't do anything special for multi turn. It basically treat the data multiturn example as another regular input example during training.

rasbt avatar May 20 '24 22:05 rasbt

Thank you very much for your reply,I've read your explanation about Dora, it's excellent. Thank you.I hope to use it in the LitGPT project.

fireyanci avatar May 21 '24 11:05 fireyanci

Glad to hear you found it useful! There are currently so many todos, but yeah, adding DoRA to LitGPT some time would be great.

rasbt avatar May 21 '24 14:05 rasbt