streaming icon indicating copy to clipboard operation
streaming copied to clipboard

Estimate total shards at the beginning of data conversion

Open abhijithneilabraham opened this issue 1 year ago • 1 comments

🚀 Feature Request

Number of shards that would be created, estimated with help of size_limit and data size can be a useful metric.

Motivation

If in future, other features such as resume data conversion etc are implemented , it could be built with the help of this feature.

[Optional] Implementation

Additional context

abhijithneilabraham avatar Aug 03 '24 23:08 abhijithneilabraham

Hey @abhijithneilabraham thanks for this issue! How would you propose finding the dataset size ahead of time? MDSWriter currently has no knowledge of how large your raw dataset files are or how it is being used to iterate over your original dataset...

snarayan21 avatar Sep 16 '24 14:09 snarayan21