L-OBS
L-OBS copied to clipboard
Found some error when use L-OBS to prune deep cnn(alexnet, vgg, etc)
Hi Shangyu, Thanks for sharing the code! After reading your paper, I do some experiments. I found the code at L-OBS/lenet300-100 do a good jop when prune lenet300-100(some small bug exist, but it is easy to correct it). But I found an error at L-OBS/Resnet50/calculate_hessian_inverse.py, when calculate the hessian matrix:
def calculate_hessian_fc_tf(layer_inputs):
a = tf.expand_dims(layer_inputs, axis=-1)
# print 'a shape: %s' %a.get_shape()
a = tf.concat([a, tf.ones([tf.shape(a)[0], 1, 1])], axis=1)
# print 'a shape: %s' %a.get_shape()
# print 'get_patches_op shape: %s' %get_patches_op.get_shape()
b = tf.expand_dims(layer_inputs, axis=1)
b = tf.concat([b, tf.ones([tf.shape(b)[0], 1, 1])], axis=2)
# print 'b shape: %s' %b.get_shape()
outprod = tf.multiply(a, b)
# print 'outprod shape: %s' %outprod.get_shape()
**return tf.reduce_mean(outprod, axis=0)**#Average hessian matrix at axis batch_size
My understanding is, before calculate the hessian inverse, hessian matrix should divide by dataset_size which equal to batch_size*num_batch, instead of divide by batch_size. It is that right? Some error also happen at L-OBS/Resnet50/prune_weights.py
def prune_weights_fc(weights, biases, hessian_inverse, CR):
n_hidden_1 = int(weights.shape[0])
n_hidden_2 = int(weights.shape[1])
gate_w = np.ones([n_hidden_1, n_hidden_2])
gate_b = np.ones([n_hidden_2])
sensitivity = np.array([])
for i in range(n_hidden_2):
sensitivity = np.hstack(
(sensitivity, 0.5 * (np.hstack((weights.T[i], biases[i])) ** 2) / np.diag(hessian_inverse)))
sorted_index = np.argsort(sensitivity) # Sort from small to big
# Begin pruning
n_total = int(n_hidden_1 * n_hidden_2)
n_total_prune = int(n_hidden_1 * n_hidden_2 * (1 - CR))
for i in range(n_total_prune):
prune_index = sorted_index[i]
x_index = prune_index / (n_hidden_1 + 1) # next layer num 0----n_hidden_2
y_index = prune_index % (n_hidden_1 + 1) # this layer num 0----n_hidden_1
if y_index == n_hidden_1: # b
delta_w = (-biases[x_index] / (hessian_inverse[y_index][y_index])) * hessian_inverse.T[y_index]
gate_b[x_index] = 0
else:
delta_w = (-weights[x_index][y_index] / hessian_inverse[y_index][y_index]) * hessian_inverse.T[y_index]
gate_w[x_index][y_index] = 0
weights[x_index] = weights[x_index] + delta_w[0: -1].T
'''
I think it should be:
delta_w = (-weights[y_index][x_index] / hessian_inverse[y_index][y_index]) * hessian_inverse.T[y_index]
gate_w[y_index][x_index] = 0
weights.T[x_index] = weights.T[x_index] + delta_w[0: -1]
'''
biases[x_index] = biases[x_index] + delta_w[-1]
# Watch info
if i % n_total == 0 and i != 0:
CR = int(100 - (i / n_total) * 5)
print '[%s] Now prune to CR: %d' % (datetime.now(), CR)
weights = weights * gate_w
biases = biases * gate_b
if not os.path.exists('pruned_weights/%s/' % layer_name):
os.mkdir('pruned_weights/%s/' % layer_name)
np.save('pruned_weights/%s/weights.npy' % (layer_name, CR), weights)
np.save('pruned_weights/%s/biases.npy' % (layer_name, CR), biases)
After I correct the bugs and modify some code to use gpu to calculate hessian_inverse(After testing,using gpu do not affect the result of hessian_inverse), I prune resnet50 with prune percent 60% and the output of fc layer is all Nan. Some error must happen. Have you test this code? Or is it a totally toy code? Because I found lots of bugs in the code and it did not work when I use it to prune deep cnn. I have test Resnet50(output Nan), alexnet(precision decline a lot), vgg16(precision decline a lot) Waiting for your answer. Best wishes!
Hi @dsfour ,
Thanks for using our codes.
You are correct about the hessian matrix calculation. The hessian matrix should be divide by (dataset_size * num_batches).
For the nan problem, it is because that sometimes the diag of hessian inverse is so small that the division in prune_weights.py leads to nan. We solve it by using another method of hessian inverse calculation as mentioned in the paper, which hasn't been uploaded.
We are currently transfering the codes to PyTorch for refinement, and the more completed codes will be uploaded. Thanks for mentioning about your situation on ResNet50, AlexNet, VGG and GoogleNet. We will implement more experiments on these networks to find out the problems. Will let you know when the new codes are uploaded.
Best regards, Shangyu
Thank you for your quick reply! @csyhhu I also found the diag of hessian inverse of some layers is so small and don't know how to solve it. You guys do a good job! Your answer confime my guess!I am trying to use Woodbury matrix identity mentioned in your paper to calculate hessian inverse matrix(HIM). But the speed is much slow when calculate HIM of conv layer(more than 10x slower). I think it is because the computation of HIM in your paper is recursively and conv layer have much input samples(after extract extract_image_patches). Besides, the method in paper use more matrix product ops than the original method . It will take more than 10 hours to calculate HIM of a conv layer(100000 input images) of alexnet on one m40 gpu. It that normal?
Hi @dsfour ,
The slow speed comes from the huge number of patches after extracted from input data (because of sliding window in convolution, number of extracted patches is always 1k for 1 input images), in my current work, I found that it is fine to reduce the number of input images (from 100000 to 200) and increase the stride (so as to reduce the extracted patches), such reduction does not harm the final performance much (But I am not sure whether such simplification applies here). Maybe you can try so.
If 100000 images are used with full extracted patches, I think the time will be around hours. What you meet is normal.
I will try to reduce the time in hessian calculation in the coming codes, so that every experiment can be finished within 1 day.
Sorry for confusing you, I will publish the codes as soon as possible, shall be finished in this week.