Jun Shi
Jun Shi
Thanks for the result. CaffeOnSpark incurs quite a bit overhead on a single node. I don't know answer to your second question. As for the first question, Spark puts Caffe...
The ultimate comparison should be this: how much time does it take to achieve certain accuracy, say 90%, for 1 node, 2 nodes, etc. This comparison is hard since one...
fix the solver prototxt file, I suppose.
You could run the solver file on the single node version first, i.e. BVLC Caffe. Of course, you need to change the network prototxt file accordingly (switch out the data...
No multiple sources at this moment. This is a feature we plan to support in the future.
No, our focus is distributed file formats, such as Spark DataFrame, Hadoop SequenceFile, etc. In the future, better support of DataFrame will be our main development effort. Those single-node file...
Gradients are sent once available, however all the nodes wait for updated weights before proceeding to the next iteration.
Depending on your data format, the dataset is handled by the relevant class. For example, if you use data frame to store your images, labels, etc, then the file below...
First, you prepare the dataset. Image dataset can be stored on HDFS by multiple format (e.g. sequence file, data frame, lmdb, lmdb is not encouraged for large dataset since it...
Yes, you need to generate the dataset manually before training/testing. We provide some example tools: https://github.com/yahoo/CaffeOnSpark/tree/master/caffe-grid/src/main/scala/com/yahoo/ml/caffe/tools You can build your conversion tools if they don't meet your requirement. The best...