keras-applications icon indicating copy to clipboard operation
keras-applications copied to clipboard

The drop connect rate (aka survival rate) is incorrect

Open xhluca opened this issue 3 years ago • 9 comments

I originally posted this as an issue here: https://github.com/qubvel/efficientnet/issues/135

However I noticed the two implementations were the same and the error exists here as well, so I decided to post it here.


I just verified with the reference tf.keras implementation, and here are the results. Below is the output for B5

This implementation's drop connect rate

(index, name, rate)

0 block1b_drop 0.9875
1 block1c_drop 0.975
2 block2b_drop 0.95
3 block2c_drop 0.9375
4 block2d_drop 0.925
5 block2e_drop 0.9125
6 block3b_drop 0.8875
7 block3c_drop 0.875
8 block3d_drop 0.8625
9 block3e_drop 0.85
10 block4b_drop 0.825
11 block4c_drop 0.8125
12 block4d_drop 0.8
13 block4e_drop 0.7875
14 block4f_drop 0.775
15 block4g_drop 0.7625
16 block5b_drop 0.7375
17 block5c_drop 0.725
18 block5d_drop 0.7124999999999999
19 block5e_drop 0.7
20 block5f_drop 0.6875
21 block5g_drop 0.675
22 block6b_drop 0.6499999999999999
23 block6c_drop 0.6375
24 block6d_drop 0.625
25 block6e_drop 0.6125
26 block6f_drop 0.6
27 block6g_drop 0.5874999999999999
28 block6h_drop 0.575
29 block6i_drop 0.5625
30 block7b_drop 0.5375
31 block7c_drop 0.5249999999999999
32 top_dropout 0.6

Tensorflow's drop connect rate

0 block1b_drop 0.9948717948717949
1 block1c_drop 0.9897435897435898
2 block2b_drop 0.9794871794871794
3 block2c_drop 0.9743589743589743
4 block2d_drop 0.9692307692307692
5 block2e_drop 0.9641025641025641
6 block3b_drop 0.9538461538461538
7 block3c_drop 0.9487179487179487
8 block3d_drop 0.9435897435897436
9 block3e_drop 0.9384615384615385
10 block4b_drop 0.9282051282051282
11 block4c_drop 0.9230769230769231
12 block4d_drop 0.9179487179487179
13 block4e_drop 0.9128205128205128
14 block4f_drop 0.9076923076923077
15 block4g_drop 0.9025641025641026
16 block5b_drop 0.8923076923076922
17 block5c_drop 0.8871794871794871
18 block5d_drop 0.882051282051282
19 block5e_drop 0.8769230769230769
20 block5f_drop 0.8717948717948718
21 block5g_drop 0.8666666666666667
22 block6b_drop 0.8564102564102564
23 block6c_drop 0.8512820512820513
24 block6d_drop 0.8461538461538461
25 block6e_drop 0.841025641025641
26 block6f_drop 0.8358974358974359
27 block6g_drop 0.8307692307692307
28 block6h_drop 0.8256410256410256
29 block6i_drop 0.8205128205128205
30 block7b_drop 0.8102564102564103
31 block7c_drop 0.8051282051282052
32 top_dropout 0.6

At index 18 it's off by a significant amount

xhluca avatar Dec 20 '20 23:12 xhluca

I check the drop rate per block and it looks fine:

block1a_ 1.0 block1b_ 0.9875 block1c_ 0.975 block2a_ 0.9625 block2b_ 0.95 block2c_ 0.9375 block2d_ 0.925 block2e_ 0.9125 block3a_ 0.9 block3b_ 0.8875 block3c_ 0.875 block3d_ 0.8625 block3e_ 0.85 block4a_ 0.8375 block4b_ 0.825 block4c_ 0.8125 block4d_ 0.8 block4e_ 0.7875 block4f_ 0.775 block4g_ 0.7625 block5a_ 0.75 block5b_ 0.7375 block5c_ 0.725 block5d_ 0.7124999999999999 block5e_ 0.7 block5f_ 0.6875 block5g_ 0.675 block6a_ 0.6625 block6b_ 0.6499999999999999 block6c_ 0.6375 block6d_ 0.625 block6e_ 0.6125 block6f_ 0.6 block6g_ 0.5874999999999999 block6h_ 0.575 block6i_ 0.5625 block7a_ 0.55 block7b_ 0.5375 block7c_ 0.5249999999999999

darcula1993 avatar Dec 21 '20 07:12 darcula1993

@darcula1993 I'm confused. Shouldn't the block rate be at ~0.8 for the final block since the drop_connect_rate is 0.2 by default?

xhluca avatar Dec 22 '20 15:12 xhluca

So it turns out I pasted different values. However the problem remains as indicated.

xhluca avatar Dec 22 '20 15:12 xhluca

        for j in range(round_repeats(args.pop('repeats'))):
            # The first block needs to take care of stride and filter size increase.
            if j > 0:
                args['strides'] = 1
                args['filters_in'] = args['filters_out']
            x = block(x, activation_fn, drop_connect_rate * b / blocks,
                      name='block{}{}_'.format(i + 1, chr(j + 97)), **args)
            b += 1

I check the code and seems that b can be greater than num of blocks,not sure why.

darcula1993 avatar Dec 23 '20 05:12 darcula1993

I've observed the same thing as well.

xhluca avatar Dec 23 '20 17:12 xhluca

Qubvel's implementation does not calculate the total number of blocks correctly for configurations larger than B0.

fmbahrt avatar Jan 02 '21 22:01 fmbahrt

Practically it does perform better than official. -_- .

innat avatar Jan 03 '21 07:01 innat

Practically it does perform better than official. -_- .

I've only observed better performance in one case so I'm not sure it generalizes. In that case, the improved performance does indicate that extreme low survival rates (<0.3) might be a good regularization approach.

xhluca avatar Jan 04 '21 15:01 xhluca

Well, I'm not sure, maybe I need to look again properly. In fact, I spent almost a week assuming that there probably some problem with my data loader using the official efficient-net. But when I use non-official implementation, it was just fine.

innat avatar Jan 05 '21 08:01 innat