Ming-Hsuan-Tu
Ming-Hsuan-Tu
### Describe the bug When the distiller contain parameter, it woudl be converted to DDP. After running autoslim, it would throw the error like ```bash DDP does have `exec_teacher_forward` ```...
When using Self Distiller with Channel Wise Distill, there would be a backward exception due to the tensor from teacher is not detached. The PR fixed this bug.
It's very similar to #79, it's a bug of the parser, except the exception is that it reached maximum recursion when calling `trace_bn_conv_links` (https://github.com/open-mmlab/mmrazor/blob/master/mmrazor/models/pruners/structure_pruning.py#L124). To reproduce, simply replace the model...
How to open tagbar in current buffer using :sp? just like the screenshot in this page: http://starryalley.twbbs.org/blog/index.php?/archives/1221-Notes-to-use-Eclim+Vim-to-develop-Android-App.html However, The screenshot of the above page use taglist , not tagbar. It...
Hi, I tried `add_scalar_dict` as following ``` local foo = cc:create_experiment("foo") local d = {} d['bar'] = 3 d['hoo'] = 4 foo:add_scalar_dict(d) ``` but got the errors ``` attempt to...
I tried to compile TNN on jetson nano. But failed to compile when enable cuda. ![image](https://user-images.githubusercontent.com/1567200/158764351-0fa75a51-1a33-4b6d-8bfb-44918502016d.png) I don't enable TensorRT, why it needs libnvinfer.so?
I found out paper said that ![image](https://user-images.githubusercontent.com/1567200/164198555-2cf0566f-15ec-4fa0-8fd7-7d10e8fabe47.png) they subtract `p * weight_decay * learning_rate` from `p`. but from your implementation (https://github.com/jettify/pytorch-optimizer/blob/master/torch_optimizer/sgdw.py#L121), you only subtract `weight_decay * learning_rate` from `p`. Any...
Hi, Can I disable MKL when compiling from source? is mkl a must required package for arrayfire? thank you
Hi, since the pretrained model is trained with ImageNet, Is it ok to release it as a commercial model without the permission from ImageNet?
Hi, From the paper you use cosine learning rate scheduler to train imagenet, did you apply both adamp and sgdp? what is your opinion about using constant learning rate scheduler...