ffcv icon indicating copy to clipboard operation
ffcv copied to clipboard

Merging two FFCV .beton datasets

Open manideep2510 opened this issue 1 year ago • 3 comments

Hi

Let's say we have two datasets written to two FCCV dataset files say dataset1.beton and dataset2.beton. Assume both these datasets have the exact same image and label format and are written using the same method. Is it possible to take these two .beton dataset files and merge them into a single merged-dataset.beton FFCV dataset file?

This will be very useful when we want to continuously increase the size of the training datasets through continuous data collection.

Thanks!

manideep2510 avatar Nov 03 '22 07:11 manideep2510

+1

It would be very helpful in production systems

IlyaMescheryakov1402 avatar Jan 29 '23 09:01 IlyaMescheryakov1402

  • 1 Maybe this could be done by passing more than just one fname to a Loader. Indexing for the Loader could be extended over both .beton files?

kschuerholt avatar May 16 '23 18:05 kschuerholt

+1 Any update on this issue? This feature will be significantly useful for our usecase as well.

AlexSunNik avatar Aug 23 '23 01:08 AlexSunNik