instructor-embedding icon indicating copy to clipboard operation
instructor-embedding copied to clipboard

Discrepancy in training data versions

Open vaibhavad opened this issue 1 year ago • 0 comments

Thank you for the great work and releasing the datasets and models. I downloaded the MEDI dataset few months ago and the length of the dataset in that file is 1435000

When I download it today, the dataset size is 1240000.

What is the difference between these two versions? Are there some samples which have been discarded? If so, where they from any specific dataset? Have any new samples been added?

vaibhavad avatar Jan 30 '24 17:01 vaibhavad