OpenChatKit icon indicating copy to clipboard operation
OpenChatKit copied to clipboard

${DIR}/../data/OIG/files/unified_ni.jsonl:0.2

Open wallon-ai opened this issue 1 year ago • 2 comments

Could you please tell me what's the meaning of 0.2? Can I add my own data to the DATASETS? If so, how should i do? Thanks so much!

wallon-ai avatar Mar 16 '23 02:03 wallon-ai

It is the sampling weights used to build training batches from multiple data sources. E.g. data_a:0.2,data_b:0.8, each time it has a 20% chance of sampling the sequence from data_a and an 80% chance of sampling from data_b.These weights will be normalized internally.

LorrinWWW avatar Mar 17 '23 16:03 LorrinWWW

Thanks, @LorrinWWW. Let's add this to the training README?

csris avatar Mar 18 '23 05:03 csris