CaffeOnSpark icon indicating copy to clipboard operation
CaffeOnSpark copied to clipboard

SocketCaffeNet UT should be enhanced

Open fanshiqing opened this issue 8 years ago • 4 comments

After carefully check of the codebase, it seems that currently the SocketCaffeNet related UT is not so make sense. I'm curious that how did you ensure the correctness of your distributed training mode instantly and conveniently as the codebase is developing? Maybe there already exists work around except the CaffeOnSpark?

Thanks advance for any help:) @anfeng

fanshiqing avatar Oct 19 '16 08:10 fanshiqing

Please explain why it doesn't "make sense". We will happy to enhance it as needed.

Be aware that SocketCaffeNet is a low-level API invoked by CaffeOnSpark via JNI.

  • https://github.com/yahoo/CaffeOnSpark/blob/master/caffe-grid/src/main/scala/com/yahoo/ml/caffe/CaffeProcessor.scala#L76-L77
  • https://github.com/yahoo/CaffeOnSpark/blob/master/caffe-distri/src/main/cpp/jni/JniCaffeNet.cpp#L47-L64

anfeng avatar Oct 19 '16 22:10 anfeng

I'm basicaly familiar with the CaffeOnSpark codebase and have been developing on it for several months. What I mean is why not add a complete train test for socketnet who's cluster_size >= 2 just like the localnet?

fanshiqing avatar Oct 19 '16 22:10 fanshiqing

I agree that we should expand the unit tests to simulate distributed training using SocketCaffeNet.

@fanshiqing any interest to work on it? We will be happy to review your contributions.

anfeng avatar Oct 19 '16 23:10 anfeng

Thanks! @anfeng Actually for my case I have changed the native CaffeOnSpark code framework and now I need to verify the correctness of my changes so that it keeps working well for true distributed deep training just as the native CaffeOnSpark does. The basic test which using LocalCaffeNet has passed and more complicated tests which simulate distributed training locally should be carried out and be verified carefully. I have encountered some problems which haven't been addressed at present.

fanshiqing avatar Oct 19 '16 23:10 fanshiqing