cv-dataset icon indicating copy to clipboard operation
cv-dataset copied to clipboard

Delta releases are gone?

Open protonicage opened this issue 1 month ago • 1 comments

So Mozilla launched this new approach to download the common voice dataset, which is fine. However I would like to know, how I can get a delta release now?

Why?

  1. They are smaller and in general i dont need to download the whole dataset which can be quite big.
  2. Even more important: In the age of AI it can be a key factor to know when a dataset was produced in order to make sure it is NOT included in the training material of a model. For example for benchmarking models

I hope these delta segments still exist, since there is no way to see when a file was added to the dataset when you download it as a whole (besides generating unneccessary traffic).

Can someone enlighten me on this topic? Am I too stupid to find the datasets or are they just gone with the new UI?

protonicage avatar Nov 25 '25 10:11 protonicage