tensorflow-yolo3
tensorflow-yolo3 copied to clipboard
is there a efficient way to shuffle the data?
dataset = dataset.repeat().shuffle(70000).batch(batch_size).prefetch(batch_size) i test the shuffle function and i believe the buffer_size decide the max index of the original data can be sampled, and my data is huge, so when i use the model to train, it stucked at the 40k+, like this: 2018-11-21 21:07:14.170579: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 46287 of 70000 2018-11-21 21:07:24.262936: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:98] Filling up shuffle buffer (this may take a while): 46432 of 70000 no more logs any suggestions would be appreciate!
I meet the same problem, did you solved it?
@WeifaGan not yet :(
set shuffle(70000) to shuffle(1024)
@Duferen assume the data's id is range(0, 70000), if i set 70000 to 1024, i will never get the data which id is after the 1024, so in this way, i just use a very small sample of the original data(1024 of 70000)