tensorflow-yolo3 icon indicating copy to clipboard operation
tensorflow-yolo3 copied to clipboard

is there a efficient way to shuffle the data?

Open 1453042287 opened this issue 7 years ago • 4 comments
trafficstars

dataset = dataset.repeat().shuffle(70000).batch(batch_size).prefetch(batch_size) i test the shuffle function and i believe the buffer_size decide the max index of the original data can be sampled, and my data is huge, so when i use the model to train, it stucked at the 40k+, like this: 2018-11-21 21:07:14.170579: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 46287 of 70000 2018-11-21 21:07:24.262936: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 46432 of 70000 no more logs any suggestions would be appreciate!

1453042287 avatar Nov 21 '18 14:11 1453042287

I meet the same problem, did you solved it?

forwardwfg avatar Jan 13 '19 11:01 forwardwfg

@WeifaGan not yet :(

1453042287 avatar Jan 14 '19 01:01 1453042287

set shuffle(70000) to shuffle(1024)

Duferen avatar Jan 17 '19 08:01 Duferen

@Duferen assume the data's id is range(0, 70000), if i set 70000 to 1024, i will never get the data which id is after the 1024, so in this way, i just use a very small sample of the original data(1024 of 70000)

1453042287 avatar Jan 18 '19 00:01 1453042287