anfeng

Results 15 comments of anfeng

Please check your solver configuration file Andy On Wed, Jul 13, 2016 at 10:57 AM, GnosisYu [email protected] wrote: > I built a spark cluster with several nodes to implement CaffeOnSpark....

Please explain why it doesn't "make sense". We will happy to enhance it as needed. Be aware that SocketCaffeNet is a low-level API invoked by CaffeOnSpark via JNI. - https://github.com/yahoo/CaffeOnSpark/blob/master/caffe-grid/src/main/scala/com/yahoo/ml/caffe/CaffeProcessor.scala#L76-L77...

I agree that we should expand the unit tests to simulate distributed training using SocketCaffeNet. @fanshiqing any interest to work on it? We will be happy to review your contributions.

You should create a tgz file, say cos.tgz, with lib64/liblmdbjni.so etc, and specify that tgz file as --archive and extend executor's LD_LIBRARY_PATH to include ":cos.tgz/lib64". tar -cpzf ${HOME}/tmp/cos.tgz lib64 spark-submit...

Can you be more specific? CaffeOnSpark is designed for big-data. We will be happy to understand use cases w/ a mobile platform.

According to your log, your executor exited for some reason. Can you get a log of your executor? It could be caused by executor resource limitations. Andy Andy On Fri,...

@ptgoetz I could live with NimbusTracker, but don't like SupervisorTracker. How about that we rename it to SupervisorPeer? We may want to rename BaseTracker to BasePeer.

With the introduction of BT, we don't need the following interface of Nimbus. Why are we still keeping them? - beginFileDownload - downloadChunk We should also remove Utils::downloadFromMaster().

Do you want to merge #422 first? This new pull request is much simpler though.

The revised code should address issues raised in your comments. Hopefully we are ready to merge.