He Ma
He Ma
This problem was also mentioned in #32
The "ZMQError: Address in use" error happens when the previous run failed and the socket port opened in the previous run was not closed properly causing port conflict in the...
@aryanbhardwaj , Your training cost looks okay so far. Are you training on ImageNet data? If you follow the preprocess steps in this project, you will see 5004 batch files...
@aryanbhardwaj This preprocessing setup is for doing multi-GPU training. Specifically, single GPU trains with batch_size=256, two GPUs train with batch_size=128 on each GPU, and 4 GPUs will train with batch_size=64...
@aryanbhardwaj We benchmarked training speed on GTX 1080 and Tesla K80. For GTX 1080, it takes 0.91h per epoch. For Tesla K80, it takes 1.96h per epoch. Totally 60 epochs,...
@aryanbhardwaj Yes, data pipeline would be the first to check. Verify that your training data matches the training labels. The cost not decreasing issue could be due to a bad...
@aryanbhardwaj Interesting. I haven't tried that yet. But I imagine that would require the object to be in some ratio range with respect to the image size as the way...