Michael Klachko

Results 12 issues of Michael Klachko

1. display weights (in addition to activations) 2. display how activations/weights change during training. For example, I should be able to point to checkpoint directory where my model is saved...

question

LibriSpeech dataset (e.g. train-clean-100) is split into multiple directories during preprocessing. Then during training, the code iterates through these directories: https://github.com/zzw922cn/Automatic_Speech_Recognition/blob/master/speechvalley/main/libri_train.py#L159 The problem is that for each directory, a new...

### 1. Issue or feature description ``` sudo apt-get update E: Conflicting values set for option Signed-By regarding source https://nvidia.github.io/libnvidia-container/stable/ubuntu18.04/amd64/ /: /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg != E: The list of sources could not...

https://arxiv.org/abs/1911.09665 In the paper, they propose calculating two losses: one for the forward pass with "clean" BN params, and another for the forward pass with adversarial BN params. Then they...

enhancement
help wanted

Please support InPlace Activated Batchnorm [1] and/or Gradient Checkpointing [2] in the examples (e.g. ResNet). I also saw an example of using TF memory optimizer [3], but I'm not sure...

enhancement

I just cloned your repo and when I'm launching the command: `CUDA_VISIBLE_DEVICES=2,3,4,5 python imagenet.py -a mobilenetv2 -d /path/to/dataset/ImageNet2012/ --epochs 150 --lr-decay cos --lr 0.05 --wd 4e-5 -c checkpoints --width-mult 1...

Installed apex with pip, tried running main_amp.py example. Getting this error: `SystemError: returned NULL without setting an error` > CUDA_VISIBLE_DEVICES=7,1 python -m torch.distributed.launch --nproc_per_node=2 --master_port=$RANDOM main_amp.py --data_dir /mnt/ssd2tb/imagenet -b 128...

Try single layer LSTM with 100 neurons, feed it four bars of input at a time - it will do much better.

In the [paper](https://arxiv.org/abs/1602.02830), the stochastic quantization was done by rounding up with probability p=clip(0.5x, 0, 1), and rounding down with probability 1-p. However, in the [code](https://github.com/eladhoffer/convNet.pytorch/blob/master/models/modules/quantize.py#L67-L68) it's done by adding...

I noticed that you don't cancel gradient of the large values, when using straight through estimator [here](https://github.com/eladhoffer/quantized.pytorch/blob/master/models/modules/quantize.py#L89). In QNN paper it was claimed "Not cancelling the gradient when r is...