Jun Shi

Results 115 comments of Jun Shi

Assuming multiple-gpu per node and multiple nodes, there are two levels of exchange: Inside a node, each gpu computes its gradients based on its batch and send them to a...

sync version is simple to implement and verify. We do not have need for async training at this moment. In addition, we are limited by our resource. Your contribution is...

@jacklonghui Regarding slicing, it is an efficient implementation of all-reduce. If all the clients send its gradients to one node, then that node will be a bottleneck. What's implemented in...

1) yes, it does training as well. 2) everybody's gradient is different, since the gradients are calculated based on individual's mini batch. then the gradients are aggregated and applied to...

The line you quote is conceptional true. What's implemented here is different. In this particular implementation, everybody is a master and a worker. So you can regard every node as...

If you have access to the executors, go there and check the CPU usage etc. I suspect the job is stuck. For feature extraction, make sure you set batch size...

I don't know where the problem is. I only use Yarn mode, which sets spark.executor.cores to 1. So one core per executor. I am not sure what will happen if...

It has been a while since I used CaffeOnSpark. I did not remember any problem with "features" mode. The synchronicity between executors are not required in this mode, so it...

BVLC caffe runs on Jetson TK1. I don't know about Spark. if Spark works on Jetson as well, then it is very likely you can run CaffeOnSpark on it.

You can use "-test" or "-features" to get the prediction and extra features. See Step 8 of this page https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_yarn. In that example "-train" and "-features" are combined. You can...