dsir
dsir copied to clipboard
DSIR large-scale data selection framework for language model training
How do you calculate the KL reduction for dataset feature distribution?
Hi, Can you release the code for the computation of KL reduction in Figure 3 in the paper? Thank you very much!
Kindly request to release code about DSIR with a neural importance weight estimator .
Hi, We follow the training pipeline in `experimental` to replicate the DSIR results. However, our average performance reached only 81.05, significantly below the reported benchmark of 82.30. Are there any...