mmf
mmf copied to clipboard
Temperature scaled sampling for multi dataset
🚀 Feature
Temperature scaled sampling for multi dataset
Motivation
In multitask training involving multiple datasets, it is often desirable to be able to control the sampling ratios for different datasets. Currently mmf provides either equal
or size proportional
sampling. In order to have more control over the sampling we should add temperature scaled sampling.
Pitch
Currently multi_dataset_loader.py implements two sampling strategies for multiple dataset trianing. One is equal
and the other is proportional
. In order to have more control over the sampling ratios for multiple datasets, we can add a temperature(T
) scaling capability when deciding the proportions of different datasets to be used. On one extreme, when T=1
it will be same as the current 'proportion' sampling. As T
increases the sampling tends to become more and more equal
. Reference: Google T5
Additional context
Temperature scaled sampling is often required during multi task training. For reference Google's T5 paper. The task will involve adding a temperature parameter that can be configured for sampling datasets during multi dataset training.
@vedanuj @apsdehal any update on this? I'd like to work on this if no one is already on it and if it is still needed since we now have the option of specifying the sampling ratios for each dataset.
Hi, @vedanuj @apsdehal any update on this? I'd like to work on this and this could be my first issue
@parthduggal Thanks for your interest. This can be added as a form of iteration strategy. Take a look at https://github.com/facebookresearch/mmf/blob/master/mmf/datasets/iteration_strategies.py
@apsdehal , if I'm not wrong, I have to add temperature scaled sampling to the current iteration strategies in the format of the other iteration strategies that you showed in the link?
May i work on this..
@shinobi-AI Since, @parthduggal is already working on this can you check any other issue?
take