Jun Shi comments

Results 115 comments of


                                            Jun Shi

CaffeOnSpark slow in comparison with caffe

Thanks for the result. CaffeOnSpark incurs quite a bit overhead on a single node. I don't know answer to your second question. As for the first question, Spark puts Caffe...

CaffeOnSpark slow in comparison with caffe

The ultimate comparison should be this: how much time does it take to achieve certain accuracy, say 90%, for 1 node, 2 nodes, etc. This comparison is hard since one...

Core dump failures

fix the solver prototxt file, I suppose.

Core dump failures

You could run the solver file on the single node version first, i.e. BVLC Caffe. Of course, you need to change the network prototxt file accordingly (switch out the data...

Does CaffeOnSpark support multiple LMDB Files For Training/Testing

No multiple sources at this moment. This is a feature we plan to support in the future.

Does CaffeOnSpark support multiple LMDB Files For Training/Testing

No, our focus is distributed file formats, such as Spark DataFrame, Hadoop SequenceFile, etc. In the future, better support of DataFrame will be our main development effort. Those single-node file...

On the timing problem of parameter synchronization.

Gradients are sent once available, however all the nodes wait for updated weights before proceeding to the next iteration.

image dataset on HDFS

Depending on your data format, the dataset is handled by the relevant class. For example, if you use data frame to store your images, labels, etc, then the file below...

image dataset on HDFS

First, you prepare the dataset. Image dataset can be stored on HDFS by multiple format (e.g. sequence file, data frame, lmdb, lmdb is not encouraged for large dataset since it...

Yes, you need to generate the dataset manually before training/testing. We provide some example tools: https://github.com/yahoo/CaffeOnSpark/tree/master/caffe-grid/src/main/scala/com/yahoo/ml/caffe/tools You can build your conversion tools if they don't meet your requirement. The best...