inference icon indicating copy to clipboard operation
inference copied to clipboard

Could you please provide a processed binary loader of the Terabyte dataset?

Open 12eadp001 opened this issue 3 years ago • 3 comments

The Terabyte dataset is very large and very hard to preprocess, but for the inference task, we only need the last day's data, which is relatively small and affordable. Now many people including me are prohibited from playing with the full dataset because of the hardware limitation, so I am wondering if you could provide a saved binary loader for us? Thanks.

12eadp001 avatar Aug 16 '21 09:08 12eadp001

This is exactly what I have been wondering......Processing the whole 1.1 TB data only to get the last day as the test set is too expensive, and I had failed a few times and finally had to switch to the subsampled version. I hope the MLCommons community could help make this experiment more accessable.

EtoDemerzel0427 avatar Aug 16 '21 13:08 EtoDemerzel0427

@mnaumovfb Can you please take a look at the issue and comment?

rnaidu02 avatar Oct 05 '21 16:10 rnaidu02

We need Criteo to allow last day of logs to be preprocessed by mlcommons and shared with inference wg.

rnaidu02 avatar Nov 09 '21 17:11 rnaidu02

outdated

mrasquinha-g avatar May 23 '23 10:05 mrasquinha-g