Tutorial_BayesianCompressionForDL
Tutorial_BayesianCompressionForDL copied to clipboard
Samples of compression for LeNet
Hi author
Thanks for sharing the code. I am pretty interested in this work. When I am testing the compression LeNet, it raises "dimension not match" error. Could you share an example of compressing neural network with convolutional layers?
Hi Lyken17,
sorry for coming back to you so late. Notifications are activated now ;).
The first thing that pops into my mind is a pytorch version issue. Could you provide me a
conda list
or equivalent?
A complete example is included and you should be able to run it simply. What exactly are you missing in our tutorial?
Best,
Karen
Hi Karen
The output of conda list
is
(test) ➜ Tutorial_BayesianCompressionForDL git:(master) ✗ conda list
# packages in environment at /home/ligeng/anaconda3/envs/test:
#
# Name Version Build Channel
ca-certificates 2018.03.07 0
certifi 2018.1.18 py36_0
cycler 0.10.0 <pip>
imageio 2.3.0 <pip>
kiwisolver 1.0.1 <pip>
libedit 3.1 heed3624_0
libffi 3.2.1 hd88cf55_4
libgcc-ng 7.2.0 hdf63c60_3
libstdcxx-ng 7.2.0 hdf63c60_3
matplotlib 2.2.2 <pip>
ncurses 6.0 h9df7e31_2
numpy 1.14.2 <pip>
openssl 1.0.2o h20670df_0
pandas 0.22.0 <pip>
Pillow 5.1.0 <pip>
pip 9.0.3 py36_0
pyparsing 2.2.0 <pip>
python 3.6.5 hc3d631a_0
python-dateutil 2.7.2 <pip>
pytz 2018.4 <pip>
PyYAML 3.12 <pip>
readline 7.0 ha6073c6_4
scipy 1.0.1 <pip>
seaborn 0.8.1 <pip>
setuptools 39.0.1 py36_0
six 1.11.0 <pip>
sqlite 3.22.0 h1bed415_0
tk 8.6.7 hc745277_3
torch 0.3.1 <pip>
torchvision 0.2.0 <pip>
wheel 0.31.0 py36_0
xz 5.2.3 h55aa19d_2
zlib 1.2.11 ha838bed_2
When I try to run example lenet by python example.py
, I get following errors
(test) ➜ Tutorial_BayesianCompressionForDL git:(master) ✗ python example.py
Traceback (most recent call last):
File "example.py", line 193, in <module>
main()
File "example.py", line 37, in main
transforms.ToTensor(),lambda x: 2 * (x - 0.5),
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 53, in __init__
os.path.join(self.root, self.processed_folder, self.training_file))
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/serialization.py", line 267, in load
return _load(f, map_location, pickle_module)
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/serialization.py", line 420, in _load
result = unpickler.load()
AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/_utils.py'>
Oops, the error is different from what I saw two month before. I guess there be some API update in Torch.
After solving some compatibility issues, I modify the network to LeNet and re-rerun python example.py
The network structure is
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 6, 5)
self.conv2 = BayesianLayers.Conv2dGroupNJ(6, 16, 5)
# activation
self.relu = nn.ReLU()
# layers
self.fc1 = BayesianLayers.LinearGroupNJ(16*5*5, 120, clip_var=0.04, cuda=FLAGS.cuda)
self.fc2 = BayesianLayers.LinearGroupNJ(120, 84, cuda=FLAGS.cuda)
self.fc3 = BayesianLayers.LinearGroupNJ(84, 10, cuda=FLAGS.cuda)
# layers including kl_divergence
self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2, self.fc3]
def forward(self, x):
# x = x.view(-1, 28 * 28)
# x = self.relu(self.fc1(x))
# x = self.relu(self.fc2(x))
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
command line output is
(test) ➜ Tutorial_BayesianCompressionForDL git:(master) ✗ python example.py
Traceback (most recent call last):
File "example.py", line 217, in <module>
main()
File "example.py", line 176, in main
train(epoch)
File "example.py", line 147, in train
output = model(data)
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "example.py", line 90, in forward
out = F.relu(self.fc1(out))
File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
result = self.forward(*input, **kwargs)
File "/home/ligeng/Public/Developing/Tutorial_BayesianCompressionForDL/BayesianLayers.py", line 126, in forward
xz = x * z
RuntimeError: The size of tensor a (256) must match the size of tensor b (400) at non-singleton dimension 1
The modified example.py
is uploaded to gist https://gist.github.com/Lyken17/8e0cae9a9aa6911190fd1b580ca75296
I can run original example without problem, but when I try to run with convolutional layer, I cannot figure out the proper way. Could you show an example of pruning LeNet?
Hi Lyken17,
the problem you are experiencing has little to do with the Bayesian Layer but rather with a shape mismatch. The feature map coming out of 'conv2' is (16x4x4). If you change it, it should run. Additionally, I recommend telling all layers the cuda status.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# activation
self.relu = nn.ReLU()
# layers
self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 6, 5, cuda=FLAGS.cuda)
self.conv2 = BayesianLayers.Conv2dGroupNJ(6, 16, 5, cuda=FLAGS.cuda)
self.fc1 = BayesianLayers.LinearGroupNJ(16*4*4, 120, clip_var=0.04, cuda=FLAGS.cuda)
self.fc2 = BayesianLayers.LinearGroupNJ(120, 84, cuda=FLAGS.cuda)
self.fc3 = BayesianLayers.LinearGroupNJ(84, 10, cuda=FLAGS.cuda)
# layers including kl_divergence
self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2, self.fc3]
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
Runs for me!
I will also add a requirements file so that we do not run into trouble with pytorch's API changes.
Cheers, Karen
@KarenUllrich The network trains fine for convolution layers, but the compression.py functions do not work for convolutional weights/filters. I have made some changes in the compute_posterior_params to compute post_weight_mu and post_weight_var correctly for Convolutional layers.
I still get the error in extract_pruned_params because the size of mask and post_weight_mu for Conv layer 1 is different. To be specific, if you consider the above example, post_weight_mu has size (6,1,5,5) where as mask has a size (16,6). It looks like, get_masks() needs to be changed as well to get the correct masks for convolutional filters.
Is it?
Hi,
I am having the same issue. The conv network trains but I am unable to get compression rates - same error as above. Here is a snippet to reproduce - part of example.py.
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# activation
self.relu = nn.ReLU()
# layers
self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 16, 5, cuda=FLAGS.cuda, padding=2)
self.conv2 = BayesianLayers.Conv2dGroupNJ(16, 36, 5, cuda=FLAGS.cuda, padding=2)
self.fc1 = BayesianLayers.LinearGroupNJ(36 * 7 * 7, 128, clip_var=0.04, cuda=FLAGS.cuda)
self.fc2 = BayesianLayers.LinearGroupNJ(128, 10, cuda=FLAGS.cuda)
#pool
self.pool = nn.MaxPool2d((2,2))
# layers including kl_divergence
self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2]
def forward(self, x):
x = x.view(-1, 1, 28, 28)
x = self.conv1(x)
x = self.pool(x)
x = self.relu(x)
x = self.conv2(x)
x = self.pool(x)
x = self.relu(x)
x = x.view(-1, 36*7*7)
x = self.relu(self.fc1(x))
x = self.fc2(x)
return x
I run python convMLP.py --batchsize 64 --epochs 1
I get
Epoch: 1 Train loss: 15.456320
Test loss: 0.0380, Accuracy: 9883/10000 (98.83%)
Traceback (most recent call last):
File "convMLP.py", line 204, in <module>
main()
File "convMLP.py", line 181, in main
compute_compression_rate(layers, model.get_masks(thresholds))
File "compression.py", line 119, in compute_compression_rate
weight_mus, weight_vars = extract_pruned_params(layers, masks)
File "compression.py", line 83, in extract_pruned_params
post_weight_mu, post_weight_var = layer.compute_posterior_params()
File "BayesianLayers.py", line 251, in compute_posterior_params
self.post_weight_var = self.z_mu.pow(2) * weight_var + z_var * self.weight_mu.pow(2) + z_var * weight_var
RuntimeError: The size of tensor a (16) must match the size of tensor b (5) at non-singleton dimension 3
In your paper you show compression rates for VGG and convolutional architectures, that is what I am trying to reproduce. Help!
Aswin
You will need to make some changes in BayesianLayers.py and get_masks() function to prune the conv layers. With the current code, you can only prune linear layers.
def compute_posterior_params(self):
weight_var, z_var = self.weight_logvar.exp(), self.z_logvar.exp()
part1 = self.z_mu.pow(2)[:, None, None, None] * weight_var
part2 = z_var[:, None , None, None] * self.weight_mu.pow(2)
part3 = z_var[:, None , None, None] * weight_var
self.post_weight_var = part1 + part2 + part3
self.post_weight_mu = self.z_mu[:, None , None, None] * self.weight_mu
return self.post_weight_mu, self.post_weight_var
To explain this in a bit more detail, z_mu and weight_var for lenet-5's first conv layer are respectively of size (20) and (20,1,5,5), and therefore you get a error in multiplying them.
You will also need to change the get_masks() function, to create mask for conv weights.
Thank you @gullalc for your answer. Do you know what the changed get_masks() would be? EDIT: It would be great if you issue a PR with those changes to conv and hopefully the authors will merge the changes. EDIT2: Thank you for adding an explanation. It would be great if @KarenUllrich can comment.
Sure. This is the get_masks() function I am using. Basically incorporating the difference in size of weights in conv layers and linear layers, as done in compute posterior params. Secondly, flattening out the mask of last conv layer so that it can be multiplied with mask of linear layer. The code is self explanatory. I think, this should work with both CNNs and fully connected neural networks, although it can be simplified a bit more.
def get_masks(self,thresholds):
weight_masks = []
mask = None
for i, (layer, threshold) in enumerate(zip(self.kl_list, thresholds)):
# compute dropout mask
if len(layer.weight_mu.shape) > 2:
if mask is None:
mask = [True]*layer.in_channels
else:
mask = np.copy(next_mask)
log_alpha = layers[i].get_log_dropout_rates().cpu().data.numpy()
next_mask = log_alpha < thresholds[i]
weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
weight_mask = weight_mask[:,:,None,None]
else:
if mask is None:
log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
mask = log_alpha < threshold
elif len(weight_mask.shape) > 2:
temp = next_mask.repeat(layer.in_features/next_mask.shape[0])
log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
mask = log_alpha < threshold
#mask = mask | temp ##Upper bound for number of weights at first fully connected layer
mask = mask & temp ##Lower bound for number of weights at fully connected layer
else:
mask = np.copy(next_mask)
try:
log_alpha = layers[i + 1].get_log_dropout_rates().cpu().data.numpy()
next_mask = log_alpha < thresholds[i + 1]
except:
# must be the last mask
next_mask = np.ones(10)
weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
weight_masks.append(weight_mask.astype(np.float))
return weight_masks