keras-applications
keras-applications copied to clipboard
The drop connect rate (aka survival rate) is incorrect
I originally posted this as an issue here: https://github.com/qubvel/efficientnet/issues/135
However I noticed the two implementations were the same and the error exists here as well, so I decided to post it here.
I just verified with the reference tf.keras implementation, and here are the results. Below is the output for B5
This implementation's drop connect rate
(index, name, rate)
0 block1b_drop 0.9875
1 block1c_drop 0.975
2 block2b_drop 0.95
3 block2c_drop 0.9375
4 block2d_drop 0.925
5 block2e_drop 0.9125
6 block3b_drop 0.8875
7 block3c_drop 0.875
8 block3d_drop 0.8625
9 block3e_drop 0.85
10 block4b_drop 0.825
11 block4c_drop 0.8125
12 block4d_drop 0.8
13 block4e_drop 0.7875
14 block4f_drop 0.775
15 block4g_drop 0.7625
16 block5b_drop 0.7375
17 block5c_drop 0.725
18 block5d_drop 0.7124999999999999
19 block5e_drop 0.7
20 block5f_drop 0.6875
21 block5g_drop 0.675
22 block6b_drop 0.6499999999999999
23 block6c_drop 0.6375
24 block6d_drop 0.625
25 block6e_drop 0.6125
26 block6f_drop 0.6
27 block6g_drop 0.5874999999999999
28 block6h_drop 0.575
29 block6i_drop 0.5625
30 block7b_drop 0.5375
31 block7c_drop 0.5249999999999999
32 top_dropout 0.6
Tensorflow's drop connect rate
0 block1b_drop 0.9948717948717949
1 block1c_drop 0.9897435897435898
2 block2b_drop 0.9794871794871794
3 block2c_drop 0.9743589743589743
4 block2d_drop 0.9692307692307692
5 block2e_drop 0.9641025641025641
6 block3b_drop 0.9538461538461538
7 block3c_drop 0.9487179487179487
8 block3d_drop 0.9435897435897436
9 block3e_drop 0.9384615384615385
10 block4b_drop 0.9282051282051282
11 block4c_drop 0.9230769230769231
12 block4d_drop 0.9179487179487179
13 block4e_drop 0.9128205128205128
14 block4f_drop 0.9076923076923077
15 block4g_drop 0.9025641025641026
16 block5b_drop 0.8923076923076922
17 block5c_drop 0.8871794871794871
18 block5d_drop 0.882051282051282
19 block5e_drop 0.8769230769230769
20 block5f_drop 0.8717948717948718
21 block5g_drop 0.8666666666666667
22 block6b_drop 0.8564102564102564
23 block6c_drop 0.8512820512820513
24 block6d_drop 0.8461538461538461
25 block6e_drop 0.841025641025641
26 block6f_drop 0.8358974358974359
27 block6g_drop 0.8307692307692307
28 block6h_drop 0.8256410256410256
29 block6i_drop 0.8205128205128205
30 block7b_drop 0.8102564102564103
31 block7c_drop 0.8051282051282052
32 top_dropout 0.6
At index 18 it's off by a significant amount
I check the drop rate per block and it looks fine:
block1a_ 1.0 block1b_ 0.9875 block1c_ 0.975 block2a_ 0.9625 block2b_ 0.95 block2c_ 0.9375 block2d_ 0.925 block2e_ 0.9125 block3a_ 0.9 block3b_ 0.8875 block3c_ 0.875 block3d_ 0.8625 block3e_ 0.85 block4a_ 0.8375 block4b_ 0.825 block4c_ 0.8125 block4d_ 0.8 block4e_ 0.7875 block4f_ 0.775 block4g_ 0.7625 block5a_ 0.75 block5b_ 0.7375 block5c_ 0.725 block5d_ 0.7124999999999999 block5e_ 0.7 block5f_ 0.6875 block5g_ 0.675 block6a_ 0.6625 block6b_ 0.6499999999999999 block6c_ 0.6375 block6d_ 0.625 block6e_ 0.6125 block6f_ 0.6 block6g_ 0.5874999999999999 block6h_ 0.575 block6i_ 0.5625 block7a_ 0.55 block7b_ 0.5375 block7c_ 0.5249999999999999
@darcula1993 I'm confused. Shouldn't the block rate be at ~0.8 for the final block since the drop_connect_rate is 0.2 by default?
So it turns out I pasted different values. However the problem remains as indicated.
for j in range(round_repeats(args.pop('repeats'))):
# The first block needs to take care of stride and filter size increase.
if j > 0:
args['strides'] = 1
args['filters_in'] = args['filters_out']
x = block(x, activation_fn, drop_connect_rate * b / blocks,
name='block{}{}_'.format(i + 1, chr(j + 97)), **args)
b += 1
I check the code and seems that b can be greater than num of blocks,not sure why.
I've observed the same thing as well.
Qubvel's implementation does not calculate the total number of blocks correctly for configurations larger than B0.
Practically it does perform better than official. -_- .
Practically it does perform better than official. -_- .
I've only observed better performance in one case so I'm not sure it generalizes. In that case, the improved performance does indicate that extreme low survival rates (<0.3) might be a good regularization approach.
Well, I'm not sure, maybe I need to look again properly. In fact, I spent almost a week assuming that there probably some problem with my data loader using the official efficient-net. But when I use non-official implementation, it was just fine.