OpenChatKit
OpenChatKit copied to clipboard
${DIR}/../data/OIG/files/unified_ni.jsonl:0.2
Could you please tell me what's the meaning of 0.2? Can I add my own data to the DATASETS? If so, how should i do? Thanks so much!
It is the sampling weights used to build training batches from multiple data sources.
E.g. data_a:0.2,data_b:0.8
, each time it has a 20% chance of sampling the sequence from data_a
and an 80% chance of sampling from data_b
.These weights will be normalized internally.
Thanks, @LorrinWWW. Let's add this to the training README?