OpenChatKit icon indicating copy to clipboard operation
OpenChatKit copied to clipboard

question on training pipeline

Open tiger241 opened this issue 1 year ago • 0 comments

Is your feature request related to a problem? Please describe.

Hi is there a reason or motivation behind the probabilities in the data files? I am curious about making a chat-instruct bot and considering of training with a new set of probabilities.

It seems to me that you are doing weighted sampling. Is this to motivate randomness among the tasks? Is there some way to find the proper probabilities even we are interested in a sub-set of those tasks? Or is this just a heuristic that worked?

Describe the solution you'd like How to make a general instruct bot orientated towards a set of sub tasks (not all the tasks mentioned). A more refined fine-tuning if that makes sense

Describe alternatives you've considered Not sure about an alternative. The papers are not super clear about this

Additional context None

tiger241 avatar Apr 03 '23 21:04 tiger241