dsir
dsir copied to clipboard
DSIR large-scale data selection framework for language model training
How do you calculate the KL reduction for dataset feature distribution?
Hi, Can you release the code for the computation of KL reduction in Figure 3 in the paper? Thank you very much!
Kindly request to release code about DSIR with a neural importance weight estimator .
Hi, We follow the training pipeline in `experimental` to replicate the DSIR results. However, our average performance reached only 81.05, significantly below the reported benchmark of 82.30. Are there any...
Hi, thanks for your interesting and valuable work. When I run the code, I get the importance weight scale varies: from 5.4e-52 to 6.5e40. Is this normal?
Can the DSIR calculate the data metric method mentioned in the paper—KL reduction? And what are the necessary data preprocessing methods when resampling a custom dataset? My scenario involves importance...