Wei Wen

Results 41 comments of Wei Wen

Ternarizing gradients before sending to server is easy in distributed mode, but scaler sharing takes lots of time, even not sure if it is feasible in TensorFlow since its code...

The code needs at least one gpu to support multiple virtual workers. You may try to replace [this line](https://github.com/wenwei202/terngrad/blob/master/terngrad/inception/inception_train.py#L317) by `with tf.device('/cpu:0 ):` and see if it would work or...

[Here](https://github.com/wenwei202/caffe/blob/scnn/src/caffe/proto/caffe.proto#L650-L651) is the explanation. `DIRECT_SCONV` is deprecated in this branch and moved to https://github.com/IntelLabs/SkimCaffe. We used mkl sparse BLAS and CUDA cuSPARSE for sparse computation.

I don't quite understand your questions. The way you set group/block sizes depends on what kinds of structured sparsity you want to learn. After learning, some blocks/groups will be removed....

This is a little weird! Are you able to train the baseline without any regularization? Caffe is relatively old, and you should consider switch to others like pytorch.

@aradar Thanks for having interest in SSL. SSL can be easily applied to the frameworks supporting autograd, such as tensorflow and pytorch. You just need to add group Lasso regularization...

@aradar if you remove structures (such as filters and channels), then you won't have to. You will just need to create a smaller DNN with the learned structures (such as...

You need to do some [cusparse initialization and destory](https://github.com/wenwei202/caffe/blob/scnn/src/caffe/common.cpp#L107) to make it work. Please refer to cusparse guide for details.

@zjykzj it's great that you are reimplementing it in pytorch after the old days of using caffe.