Jun Shi comments

Results 115 comments of


                                            Jun Shi

How does caffeonspark exchange and synchronize the each executor's parameters?

Assuming multiple-gpu per node and multiple nodes, there are two levels of exchange: Inside a node, each gpu computes its gradients based on its batch and send them to a...

How does caffeonspark exchange and synchronize the each executor's parameters?

sync version is simple to implement and verify. We do not have need for async training at this moment. In addition, we are limited by our resource. Your contribution is...

How does caffeonspark exchange and synchronize the each executor's parameters?

@jacklonghui Regarding slicing, it is an efficient implementation of all-reduce. If all the clients send its gradients to one node, then that node will be a bottleneck. What's implemented in...

How does caffeonspark exchange and synchronize the each executor's parameters?

1) yes, it does training as well. 2) everybody's gradient is different, since the gradients are calculated based on individual's mini batch. then the gradients are aggregated and applied to...

How does caffeonspark exchange and synchronize the each executor's parameters?

The line you quote is conceptional true. What's implemented here is different. In this particular implementation, everybody is a master and a worker. So you can regard every node as...

Feature extraction mode running slow

If you have access to the executors, go there and check the CPU usage etc. I suspect the job is stuck. For feature extraction, make sure you set batch size...

Feature extraction mode running slow

I don't know where the problem is. I only use Yarn mode, which sets spark.executor.cores to 1. So one core per executor. I am not sure what will happen if...

Feature extraction mode running slow

It has been a while since I used CaffeOnSpark. I did not remember any problem with "features" mode. The synchronicity between executors are not required in this mode, so it...

Installation on NVIDIA Jetson TK1

BVLC caffe runs on Jetson TK1. I don't know about Spark. if Spark works on Jetson as well, then it is very likely you can run CaffeOnSpark on it.

How to use my model to process image data

You can use "-test" or "-features" to get the prediction and extra features. See Step 8 of this page https://github.com/yahoo/CaffeOnSpark/wiki/GetStarted_yarn. In that example "-train" and "-features" are combined. You can...