Jun Shi

Results 115 comments of Jun Shi

Any update on this from the developers? See the same issue at 1.15.0. Does anyone have a workaround? I suppose I could partition the data then save the partitions sequentially,...

We open sourced a similar package to address this issue. You can try it out here. https://github.com/linkedin/spark-tfrecord https://engineering.linkedin.com/blog/2020/spark-tfrecord

@anfeng Do you know how to update the wiki?

If you use the protofiles given in the repo, then the max iteration is at 2000 and the snap shot interval is 5000. https://github.com/yahoo/CaffeOnSpark/blob/master/data/lenet_memory_solver.prototxt#L18-L20 You won't get snapshot since the...

Try to disable validation by setting the following in your solver.prototxt file: test_iter: 0 test_interval: 0

It is possible that GPUs were not available to you, or some of the cluster settings are incorrect. Let's say a node has only one gnu, but two containers are...

Make sure you are comparing the same total batch size. https://github.com/yahoo/CaffeOnSpark/issues/244

I do not know your setup. We have seen slight improvement with lenet and on gpu. But CaffeOnSpark was not really designed to speed up tiny network/dataset like lenet/MNIST. We...

Your hardware is fine. it is not obvious to me why CaffeOnSpark is so slow.

Great. One useful experiment will be to run CaffeOnSpark on a single node, then compare it to Caffe.