keras icon indicating copy to clipboard operation
keras copied to clipboard

implemented resnet18 and resnet34

Open zaccharieramzi opened this issue 2 years ago • 42 comments

This should solve this issue : https://github.com/keras-team/keras-applications/issues/151

Which has duplicates here:

  • https://github.com/keras-team/keras/issues/15269
  • https://github.com/keras-team/keras/issues/15494

I don't know how to test this, this is why I am making it a draft PR. I haven't implemented the V2, to make this easy to review, and I haven't trained the networks to get the weights.

Note: this is a reopening of https://github.com/keras-team/keras/pull/16358, which I messed up with wrong emails in the commits.

zaccharieramzi avatar Apr 05 '22 08:04 zaccharieramzi

Adding the model summaries here for info:

Resnet18:

Model: "resnet18"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                                  
 conv1_bn (BatchNormalization)  (None, 112, 112, 64  256         ['conv1_conv[0][0]']             
                                )                                                                 
                                                                                                  
 conv1_relu (Activation)        (None, 112, 112, 64  0           ['conv1_bn[0][0]']               
                                )                                                                 
                                                                                                  
 pool1_pad (ZeroPadding2D)      (None, 114, 114, 64  0           ['conv1_relu[0][0]']             
                                )                                                                 
                                                                                                  
 pool1_pool (MaxPooling2D)      (None, 56, 56, 64)   0           ['pool1_pad[0][0]']              
                                                                                                  
 conv2_block1_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block1_0_conv (Conv2D)   (None, 56, 56, 64)   4160        ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_1_relu[0][0]']    
                                                                                                  
 conv2_block1_0_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_0_bn[0][0]',      
                                                                  'conv2_block1_2_bn[0][0]']      
                                                                                                  
 conv2_block1_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block1_add[0][0]']       
                                                                                                  
 conv2_block2_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_out[0][0]']       
                                                                                                  
 conv2_block2_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block2_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block2_1_relu[0][0]']    
                                                                                                  
 conv2_block2_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_out[0][0]',       
                                                                  'conv2_block2_2_bn[0][0]']      
                                                                                                  
 conv2_block2_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block2_add[0][0]']       
                                                                                                  
 conv3_block1_1_conv (Conv2D)   (None, 28, 28, 128)  73856       ['conv2_block2_out[0][0]']       
                                                                                                  
 conv3_block1_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block1_0_conv (Conv2D)   (None, 28, 28, 128)  8320        ['conv2_block2_out[0][0]']       
                                                                                                  
 conv3_block1_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_1_relu[0][0]']    
                                                                                                  
 conv3_block1_0_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_0_bn[0][0]',      
                                                                  'conv3_block1_2_bn[0][0]']      
                                                                                                  
 conv3_block1_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block1_add[0][0]']       
                                                                                                  
 conv3_block2_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_out[0][0]']       
                                                                                                  
 conv3_block2_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block2_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block2_1_relu[0][0]']    
                                                                                                  
 conv3_block2_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_out[0][0]',       
                                                                  'conv3_block2_2_bn[0][0]']      
                                                                                                  
 conv3_block2_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block2_add[0][0]']       
                                                                                                  
 conv4_block1_1_conv (Conv2D)   (None, 14, 14, 256)  295168      ['conv3_block2_out[0][0]']       
                                                                                                  
 conv4_block1_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block1_0_conv (Conv2D)   (None, 14, 14, 256)  33024       ['conv3_block2_out[0][0]']       
                                                                                                  
 conv4_block1_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_1_relu[0][0]']    
                                                                                                  
 conv4_block1_0_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_0_bn[0][0]',      
                                                                  'conv4_block1_2_bn[0][0]']      
                                                                                                  
 conv4_block1_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block1_add[0][0]']       
                                                                                                  
 conv4_block2_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_out[0][0]']       
                                                                                                  
 conv4_block2_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block2_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block2_1_relu[0][0]']    
                                                                                                  
 conv4_block2_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_out[0][0]',       
                                                                  'conv4_block2_2_bn[0][0]']      
                                                                                                  
 conv4_block2_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block2_add[0][0]']       
                                                                                                  
 conv5_block1_1_conv (Conv2D)   (None, 7, 7, 512)    1180160     ['conv4_block2_out[0][0]']       
                                                                                                  
 conv5_block1_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block1_0_conv (Conv2D)   (None, 7, 7, 512)    131584      ['conv4_block2_out[0][0]']       
                                                                                                  
 conv5_block1_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_1_relu[0][0]']    
                                                                                                  
 conv5_block1_0_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_0_bn[0][0]',      
                                                                  'conv5_block1_2_bn[0][0]']      
                                                                                                  
 conv5_block1_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block1_add[0][0]']       
                                                                                                  
 conv5_block2_1_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_out[0][0]']       
                                                                                                  
 conv5_block2_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block2_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block2_1_relu[0][0]']    
                                                                                                  
 conv5_block2_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_out[0][0]',       
                                                                  'conv5_block2_2_bn[0][0]']      
                                                                                                  
 conv5_block2_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block2_add[0][0]']       
                                                                                                  
 avg_pool (GlobalAveragePooling  (None, 512)         0           ['conv5_block2_out[0][0]']       
 2D)                                                                                              
                                                                                                  
 predictions (Dense)            (None, 1000)         513000      ['avg_pool[0][0]']               
                                                                                                  
==================================================================================================
Total params: 11,708,328
Trainable params: 11,698,600
Non-trainable params: 9,728
__________________________________________________________________________________________________

Resnet34:

Model: "resnet34"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
==================================================================================================
 input_1 (InputLayer)           [(None, 224, 224, 3  0           []                               
                                )]                                                                
                                                                                                  
 conv1_pad (ZeroPadding2D)      (None, 230, 230, 3)  0           ['input_1[0][0]']                
                                                                                                  
 conv1_conv (Conv2D)            (None, 112, 112, 64  9472        ['conv1_pad[0][0]']              
                                )                                                                 
                                                                                                  
 conv1_bn (BatchNormalization)  (None, 112, 112, 64  256         ['conv1_conv[0][0]']             
                                )                                                                 
                                                                                                  
 conv1_relu (Activation)        (None, 112, 112, 64  0           ['conv1_bn[0][0]']               
                                )                                                                 
                                                                                                  
 pool1_pad (ZeroPadding2D)      (None, 114, 114, 64  0           ['conv1_relu[0][0]']             
                                )                                                                 
                                                                                                  
 pool1_pool (MaxPooling2D)      (None, 56, 56, 64)   0           ['pool1_pad[0][0]']              
                                                                                                  
 conv2_block1_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block1_0_conv (Conv2D)   (None, 56, 56, 64)   4160        ['pool1_pool[0][0]']             
                                                                                                  
 conv2_block1_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_1_relu[0][0]']    
                                                                                                  
 conv2_block1_0_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block1_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_0_bn[0][0]',      
                                                                  'conv2_block1_2_bn[0][0]']      
                                                                                                  
 conv2_block1_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block1_add[0][0]']       
                                                                                                  
 conv2_block2_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block1_out[0][0]']       
                                                                                                  
 conv2_block2_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block2_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block2_1_relu[0][0]']    
                                                                                                  
 conv2_block2_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block2_add (Add)         (None, 56, 56, 64)   0           ['conv2_block1_out[0][0]',       
                                                                  'conv2_block2_2_bn[0][0]']      
                                                                                                  
 conv2_block2_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block2_add[0][0]']       
                                                                                                  
 conv2_block3_1_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block2_out[0][0]']       
                                                                                                  
 conv2_block3_1_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block3_1_relu (Activatio  (None, 56, 56, 64)  0           ['conv2_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv2_block3_2_conv (Conv2D)   (None, 56, 56, 64)   36928       ['conv2_block3_1_relu[0][0]']    
                                                                                                  
 conv2_block3_2_bn (BatchNormal  (None, 56, 56, 64)  256         ['conv2_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv2_block3_add (Add)         (None, 56, 56, 64)   0           ['conv2_block2_out[0][0]',       
                                                                  'conv2_block3_2_bn[0][0]']      
                                                                                                  
 conv2_block3_out (Activation)  (None, 56, 56, 64)   0           ['conv2_block3_add[0][0]']       
                                                                                                  
 conv3_block1_1_conv (Conv2D)   (None, 28, 28, 128)  73856       ['conv2_block3_out[0][0]']       
                                                                                                  
 conv3_block1_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block1_0_conv (Conv2D)   (None, 28, 28, 128)  8320        ['conv2_block3_out[0][0]']       
                                                                                                  
 conv3_block1_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_1_relu[0][0]']    
                                                                                                  
 conv3_block1_0_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block1_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_0_bn[0][0]',      
                                                                  'conv3_block1_2_bn[0][0]']      
                                                                                                  
 conv3_block1_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block1_add[0][0]']       
                                                                                                  
 conv3_block2_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block1_out[0][0]']       
                                                                                                  
 conv3_block2_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block2_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block2_1_relu[0][0]']    
                                                                                                  
 conv3_block2_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block2_add (Add)         (None, 28, 28, 128)  0           ['conv3_block1_out[0][0]',       
                                                                  'conv3_block2_2_bn[0][0]']      
                                                                                                  
 conv3_block2_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block2_add[0][0]']       
                                                                                                  
 conv3_block3_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block2_out[0][0]']       
                                                                                                  
 conv3_block3_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block3_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block3_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block3_1_relu[0][0]']    
                                                                                                  
 conv3_block3_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block3_add (Add)         (None, 28, 28, 128)  0           ['conv3_block2_out[0][0]',       
                                                                  'conv3_block3_2_bn[0][0]']      
                                                                                                  
 conv3_block3_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block3_add[0][0]']       
                                                                                                  
 conv3_block4_1_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block3_out[0][0]']       
                                                                                                  
 conv3_block4_1_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block4_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block4_1_relu (Activatio  (None, 28, 28, 128)  0          ['conv3_block4_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv3_block4_2_conv (Conv2D)   (None, 28, 28, 128)  147584      ['conv3_block4_1_relu[0][0]']    
                                                                                                  
 conv3_block4_2_bn (BatchNormal  (None, 28, 28, 128)  512        ['conv3_block4_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv3_block4_add (Add)         (None, 28, 28, 128)  0           ['conv3_block3_out[0][0]',       
                                                                  'conv3_block4_2_bn[0][0]']      
                                                                                                  
 conv3_block4_out (Activation)  (None, 28, 28, 128)  0           ['conv3_block4_add[0][0]']       
                                                                                                  
 conv4_block1_1_conv (Conv2D)   (None, 14, 14, 256)  295168      ['conv3_block4_out[0][0]']       
                                                                                                  
 conv4_block1_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block1_0_conv (Conv2D)   (None, 14, 14, 256)  33024       ['conv3_block4_out[0][0]']       
                                                                                                  
 conv4_block1_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_1_relu[0][0]']    
                                                                                                  
 conv4_block1_0_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block1_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_0_bn[0][0]',      
                                                                  'conv4_block1_2_bn[0][0]']      
                                                                                                  
 conv4_block1_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block1_add[0][0]']       
                                                                                                  
 conv4_block2_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block1_out[0][0]']       
                                                                                                  
 conv4_block2_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block2_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block2_1_relu[0][0]']    
                                                                                                  
 conv4_block2_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block2_add (Add)         (None, 14, 14, 256)  0           ['conv4_block1_out[0][0]',       
                                                                  'conv4_block2_2_bn[0][0]']      
                                                                                                  
 conv4_block2_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block2_add[0][0]']       
                                                                                                  
 conv4_block3_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block2_out[0][0]']       
                                                                                                  
 conv4_block3_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block3_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block3_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block3_1_relu[0][0]']    
                                                                                                  
 conv4_block3_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block3_add (Add)         (None, 14, 14, 256)  0           ['conv4_block2_out[0][0]',       
                                                                  'conv4_block3_2_bn[0][0]']      
                                                                                                  
 conv4_block3_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block3_add[0][0]']       
                                                                                                  
 conv4_block4_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block3_out[0][0]']       
                                                                                                  
 conv4_block4_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block4_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block4_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block4_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block4_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block4_1_relu[0][0]']    
                                                                                                  
 conv4_block4_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block4_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block4_add (Add)         (None, 14, 14, 256)  0           ['conv4_block3_out[0][0]',       
                                                                  'conv4_block4_2_bn[0][0]']      
                                                                                                  
 conv4_block4_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block4_add[0][0]']       
                                                                                                  
 conv4_block5_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block4_out[0][0]']       
                                                                                                  
 conv4_block5_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block5_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block5_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block5_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block5_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block5_1_relu[0][0]']    
                                                                                                  
 conv4_block5_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block5_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block5_add (Add)         (None, 14, 14, 256)  0           ['conv4_block4_out[0][0]',       
                                                                  'conv4_block5_2_bn[0][0]']      
                                                                                                  
 conv4_block5_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block5_add[0][0]']       
                                                                                                  
 conv4_block6_1_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block5_out[0][0]']       
                                                                                                  
 conv4_block6_1_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block6_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block6_1_relu (Activatio  (None, 14, 14, 256)  0          ['conv4_block6_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv4_block6_2_conv (Conv2D)   (None, 14, 14, 256)  590080      ['conv4_block6_1_relu[0][0]']    
                                                                                                  
 conv4_block6_2_bn (BatchNormal  (None, 14, 14, 256)  1024       ['conv4_block6_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv4_block6_add (Add)         (None, 14, 14, 256)  0           ['conv4_block5_out[0][0]',       
                                                                  'conv4_block6_2_bn[0][0]']      
                                                                                                  
 conv4_block6_out (Activation)  (None, 14, 14, 256)  0           ['conv4_block6_add[0][0]']       
                                                                                                  
 conv5_block1_1_conv (Conv2D)   (None, 7, 7, 512)    1180160     ['conv4_block6_out[0][0]']       
                                                                                                  
 conv5_block1_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block1_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block1_0_conv (Conv2D)   (None, 7, 7, 512)    131584      ['conv4_block6_out[0][0]']       
                                                                                                  
 conv5_block1_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_1_relu[0][0]']    
                                                                                                  
 conv5_block1_0_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_0_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block1_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block1_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_0_bn[0][0]',      
                                                                  'conv5_block1_2_bn[0][0]']      
                                                                                                  
 conv5_block1_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block1_add[0][0]']       
                                                                                                  
 conv5_block2_1_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block1_out[0][0]']       
                                                                                                  
 conv5_block2_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block2_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block2_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block2_1_relu[0][0]']    
                                                                                                  
 conv5_block2_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block2_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block2_add (Add)         (None, 7, 7, 512)    0           ['conv5_block1_out[0][0]',       
                                                                  'conv5_block2_2_bn[0][0]']      
                                                                                                  
 conv5_block2_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block2_add[0][0]']       
                                                                                                  
 conv5_block3_1_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block2_out[0][0]']       
                                                                                                  
 conv5_block3_1_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block3_1_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block3_1_relu (Activatio  (None, 7, 7, 512)   0           ['conv5_block3_1_bn[0][0]']      
 n)                                                                                               
                                                                                                  
 conv5_block3_2_conv (Conv2D)   (None, 7, 7, 512)    2359808     ['conv5_block3_1_relu[0][0]']    
                                                                                                  
 conv5_block3_2_bn (BatchNormal  (None, 7, 7, 512)   2048        ['conv5_block3_2_conv[0][0]']    
 ization)                                                                                         
                                                                                                  
 conv5_block3_add (Add)         (None, 7, 7, 512)    0           ['conv5_block2_out[0][0]',       
                                                                  'conv5_block3_2_bn[0][0]']      
                                                                                                  
 conv5_block3_out (Activation)  (None, 7, 7, 512)    0           ['conv5_block3_add[0][0]']       
                                                                                                  
 avg_pool (GlobalAveragePooling  (None, 512)         0           ['conv5_block3_out[0][0]']       
 2D)                                                                                              
                                                                                                  
 predictions (Dense)            (None, 1000)         513000      ['avg_pool[0][0]']               
                                                                                                  
==================================================================================================
Total params: 21,827,624
Trainable params: 21,810,472
Non-trainable params: 17,152
__________________________________________________________________________________________________

It turns out that they do not match PyTorch's numbers which is something I do not understand. For info, the same happens for ResNet50 (already implemented), and you can see that in the following colab: https://colab.research.google.com/drive/1RCmWkpwuKFapzzPacbqodxz0mqt9Igft?usp=sharing

This appears to be due to the fact that there are bias in TF's convs, and not in PyTorch's ones, and also due to how PyTorch counts BN's params.

However, the last dimension before the dense layer matches, and the size (WH) of the feature maps matches as well.

zaccharieramzi avatar Apr 06 '22 09:04 zaccharieramzi

So 2 things w.r.t. to the comparison with PyTorch:

  • indeed the only difference in the trainable parameter count is the use of bias in Keras. Imo, there shouldn't be any bias in the convolutions given we have affine BatchNorm just afterwards. Maybe having an option allowing to use it or not would be nice, I am going to implement it.
  • the batch norm in PyTorch indeed doesn't count the running stats as parameters but as buffers.

Side note: the default momentum values for the batch norm in Keras and PyTorch are not the same: 0.9 for PyTorch and 0.99 in Keras. This, coupled with the use of bias in TF will mean that the training will be different between the 2 frameworks.

I think it would be nice to implement the possibility to change the batch norm momentum to fit PyTorch's one, I am going to open a new issue and a new PR about this.

zaccharieramzi avatar Apr 06 '22 16:04 zaccharieramzi

Thanks for the PR. Could u make the sure the weights for imagenet also available? Also please make sure to run the evaluation with imagenet eval set, and report the acc number in the PR.

qlzh727 avatar Apr 06 '22 20:04 qlzh727

@qlzh727 should I train the models also for the no bias case?

Also, could you point me to the script that were used to train the bigger models? I couldn't find them but maybe didn't look well enough

zaccharieramzi avatar Apr 07 '22 07:04 zaccharieramzi

@qlzh727 I was looking for an official script to train a classification model on imagenet, and stumbled upon this: https://github.com/tensorflow/models

There is a typical example allowing to train classification models, but I also noticed that there is already an implementation of ResNet without the bias and with the basic blocks here. I don't think the weights are available, but now my question is more: should we re-implement it here given it's already present in this other repo?

Basically, is there a difference in concern between keras applications and tensorflow models?

zaccharieramzi avatar Apr 07 '22 11:04 zaccharieramzi

I just noticed that one additional difference with the PyTorch implementation (in both keras applications and tensorflow models) is the initialization strategy for the convolution weights.

Framework Init strategy
PyTorch He normal, nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
Keras Glorot uniform, default of Conv2D
TensorFlow Variance Scaling, at least by default

zaccharieramzi avatar Apr 07 '22 12:04 zaccharieramzi

@qlzh727 should I train the models also for the no bias case?

Also, could you point me to the script that were used to train the bigger models? I couldn't find them but maybe didn't look well enough

We currently don't have any script for retrain the model. Keras application was used for fine tuning and we usually reuse weights/checkpoints from original paper (if it was published).

qlzh727 avatar Apr 07 '22 16:04 qlzh727

@qlzh727 I was looking for an official script to train a classification model on imagenet, and stumbled upon this: https://github.com/tensorflow/models

There is a typical example allowing to train classification models, but I also noticed that there is already an implementation of ResNet without the bias and with the basic blocks here. I don't think the weights are available, but now my question is more: should we re-implement it here given it's already present in this other repo?

Basically, is there a difference in concern between keras applications and tensorflow models?

tensorflow-models is more focused on end to end solutions, and if that's already available in tf-models, we probably can skip it here in keras.application (given that you can't get any existing weigths).

qlzh727 avatar Apr 07 '22 16:04 qlzh727

Well the original paper did train both resnet 18 and 34, but not sure in which framework or even whether the weights are available. Do you know where you obtained the resnet 50 weights ?

Another solution would be to translate the ones from PyTorch, potentially forcing the bias to 0 for the original implementations with bias. Wdyt?

EDIT

One last thing is that if we do not include the resnet 18 and 34 here, it might still be nice to have a pointer to tensorflow/models, in order for people looking for an implementation to find it easily (this is not the case rn, see https://github.com/keras-team/keras-applications/issues/151)

zaccharieramzi avatar Apr 07 '22 17:04 zaccharieramzi

Do you know where you obtained the resnet 50 weights

I ported them from the original Caffe implementation IIRC.

Another solution would be to translate the ones from PyTorch, potentially forcing the bias to 0 for the original implementations with bias. Wdyt?

Sure, if you can produce an ImageNet weights checkpoint (under proper licensing), that works.

We currently don't have any script for retrain the model

Hopefully we'll have such a script in KerasCV soon.

fchollet avatar Apr 08 '22 21:04 fchollet

@fchollet thanks for your answer.

Do you know how I can check the license for the PyTorch weights? In their repo they have a vague statement regarding this.

The pre-trained models provided in this library may have their own licenses or terms and conditions derived from the dataset used for training. It is your responsibility to determine whether you have permission to use the models for your use case.

Does it mean that the license is "only" the ImageNet license, and therefore I can use these weights and port them here? Or do you think I need to ask for extra specifications?

zaccharieramzi avatar Apr 08 '22 21:04 zaccharieramzi

This just refers to the dataset as far as I can tell. Seems fine to port them.

fchollet avatar Apr 10 '22 04:04 fchollet

@fchollet @qlzh727 Ok great!

I am currently in the process of porting PyTorch weights to Keras, and I am running into a bit of an issue regarding strided convolutions. I have create a colab notebook illustrating the issue: https://colab.research.google.com/drive/1iOriG0i2tGtQENVnsXrGiQ9VZlAgeOEM?usp=sharing

Basically for the same weights and the same inputs, PyTorch and TensorFlow do not give the same outputs... This is only the case for stride = 2, and not stride = 1, where the outputs are the same.

I am going to investigate, but if you happen to know anything about this potential issue, let me know.

It seems that I am doing something wrong with TF, given how weirdly huge the output values are for stride = 2.

zaccharieramzi avatar Apr 13 '22 17:04 zaccharieramzi

Basically for the same weights and the same inputs, PyTorch and TensorFlow do not give the same outputs...

This may be related to padding defaults. See if you can change the padding mode.

fchollet avatar Apr 13 '22 23:04 fchollet

@fchollet indeed it somehow had to do with the padding, but more precisely with which side you begin the striding on. Therefore, to compensate the difference between the 2 frameworks, we need to do an anti-symmetric padding before strided convolutions in the basic blocks, in order to be able to port PyTorch weights.

I know (with a test) have the same output for the same inputs for PyTorch ResNet 18 and 34, and for the implementations in this branch.

@qlzh727 Could it be possible for me to give you the weights so that you store them on GCP? I can give you a WeTransfer link of the h5 files.

I will also try to run the evaluations on ImageNet, but I am quite confident since the output is the same as for PyTorch. Do you have an evaluation script I could use by any chance? Otherwise, I'll just craft my own.

Also is it the evaluation on the test set or some other set? Actually I guess the validation set given this quote in the doc:

The top-1 and top-5 accuracy refers to the model's performance on the ImageNet validation dataset.

Side note: I have a script for porting PyTorch weights into a Keras model now (architecture specific ofc but it could be interesting for some folks I guess in some situations). Is this something you would like to see cleaned up in the repo somewhere? Otherwise I'll just make a gist out of it.

zaccharieramzi avatar Apr 14 '22 09:04 zaccharieramzi

@qlzh727 These are the results I obtain with my port of the PyTorch weights on the ImageNet validation set:

ResNet size Top-1 acc Top-5 acc
18 0.67804 0.88187
34 0.71855 0.90719

I used the following data pipeline inspired by the PyTorch docs (I removed the num_parallel_calls to make it more readable):

ds = tfds.load(
    'imagenet2012',
    split='validation',
    as_supervised=True,
).map(
    lambda x, y: (
        tf.image.resize_with_crop_or_pad(tf.image.resize(x, (256, 256)), 224, 224),
        y,
    ),
).batch(
	64,
).map(
    lambda x, y: (
        tf.keras.applications.imagenet_utils.preprocess_input(
            x, 
			mode='torch',
        ),
        tf.one_hot(y, 1000),
    ),
)

To be honest, these numbers are slightly lower than the ones reported in the PyTorch docs:

ResNet size Top-1 acc Top-5 acc
18 0.6976 0.8908
34 0.733 0.9142

I have verified that the network performs the same function using random inputs though. I don't know if this is satisfying enough.

zaccharieramzi avatar Apr 14 '22 13:04 zaccharieramzi

I am slightly concerned that the need for manual insertion of padding operations is making the model slower, while not being necessary (it's only needed in order to be able to use the PyTorch weights checkpoint). Can you check if there is added overhead due to it on GPU or CPU? If there is, then it would seem preferable to train our own checkpoint.

fchollet avatar Apr 14 '22 16:04 fchollet

Do you mean overhead compared to PyTorch or compared to a padding within the convolution op? Anyway I can test both, but for GPU I'll have to check if it fits on colab's ones, bc I just arrived in my new institution and I don't have access to a GPU yet (although it should be the case rather soon).

I'll let you know soon enough.

zaccharieramzi avatar Apr 14 '22 20:04 zaccharieramzi

@fchollet here are the overhead results on just comparing with and without padding in Keras:

ResNet size / hardware No padding With padding
18-CPU 0.71841 0.7608
18-GPU 0.03884 0.03953
34-CPU 1.3042 1.3201
34-GPU 0.06858 0.06718

These are the results when running on a 32x224x224x3 batch with a warm-up. The CPU tests were done on my laptop. The GPU tests were done on Colab.

I don't know how much of an overhead it sounds like. It's true that in my case I would like to have the option to not use this padding when training from scratch, but I guess a lot of folks would be happy just to have the baseline even if it's not that fast.

Maybe since I cannot train the model straight away (and I would need to validate the training procedure with someone), it's okay to integrate the models as is and already have the flag, and in a second PR remove the flag with the trained weights?

zaccharieramzi avatar Apr 14 '22 22:04 zaccharieramzi

Maybe since I cannot train the model straight away (and I would need to validate the training procedure with someone), it's okay to integrate the models as is and already have the flag, and in a second PR remove the flag with the trained weights?

Yes, we could do that. Thanks for checking the step timing -- the padding overhead doesn't look so bad.

We can merge this PR and then train the model and replace the weights.

Would you need help with training the model? The main difficulty is finding the augmentation configuration, regularization configuration, and the learning rate schedule. If you have these (e.g. from another implementation) it's fairly straightforward to train the model on the Colab TPU runtime.

fchollet avatar Apr 17 '22 03:04 fchollet

Ok I'll just add the flag before merging.

I have to say I hadn't thought about the possibility to use TPUs on Colab. It's true that it might be enough.

But you are right that the reason I was asking for help was to know how I could figure out all the training hyperparameters. Maybe going through the original paper will give me all the info I need although I doubt that but I can always check.

zaccharieramzi avatar Apr 17 '22 08:04 zaccharieramzi

I have to say I hadn't thought about the possibility to use TPUs on Colab. It's true that it might be enough.

I believe it is. Happy to help you set it up, if you're interested!

fchollet avatar Apr 17 '22 10:04 fchollet

Thanks so much for the offer. I am going to try to get my hands dirty by myself a bit first, then I might come back asking for your help if I need anything re GCS or the data pipeline optimization.

zaccharieramzi avatar Apr 17 '22 18:04 zaccharieramzi

@qlzh727 you can find the weights ported from PyTorch in this WeTransfer link. And the evaluation is in this comment.

@fchollet I added the manual padding flag, so after this PR is merged I can open an issue asking to retrain the model weights without using the manual padding. Maybe you can help me there to set up the TPU training, because I am struggling with the data pipeline (see here).

zaccharieramzi avatar Apr 18 '22 00:04 zaccharieramzi

@zaccharieramzi, could u try to attach the weights to PR? or push it your github fork? What's the size of the weight file?

qlzh727 avatar Apr 18 '22 17:04 qlzh727

@qlzh727 I am not sure what you mean by attaching the weights to the PR. I cannot for example put them in a comment. I think it's because it's too heavy as a file for GitHub (the zip file for the 2 weights file is 119M). What's the problem with WeTransfer?

I am not sure also where I should push them in the fork... I see that the other weights are all in GCS.

zaccharieramzi avatar Apr 18 '22 17:04 zaccharieramzi

@qlzh727 I am not sure what you mean by attaching the weights to the PR. I cannot for example put them in a comment. I think it's because it's too heavy as a file for GitHub (the zip file for the 2 weights file is 119M). What's the problem with WeTransfer?

I am not sure also where I should push them in the fork... I see that the other weights are all in GCS.

I see. I guess 100M is probably too big here (for small files, eg less than 25M, you can drag/drop it in the comment, and it will be added as an attachment).

I would like the file to be properly tracked, so we can have a stable place to refer it. Could u try to add the weight file to your github repo, so that we can retrieve it from a commit?

qlzh727 avatar Apr 18 '22 18:04 qlzh727

@qlzh727 Ok I think I did what you asked by creating 2 commits: one that adds the weights and one that removes them.

Tell me if it doesn't work for you.

zaccharieramzi avatar Apr 18 '22 18:04 zaccharieramzi

Thanks. Will take a look.

qlzh727 avatar Apr 18 '22 20:04 qlzh727

Note that placing files in git history can be expensive. It's often a good idea to upload files as "release artifacts" of a new GitHub release of the project.

fchollet avatar Apr 19 '22 01:04 fchollet

I felt that here it was not too much of a problem since we were going to squash eventually so the heavy commits would not be in the git history.

But happy to revise the git history and have the weights as release artifacts.

zaccharieramzi avatar Apr 19 '22 07:04 zaccharieramzi

Really looking forward to this getting in! :rocket: :fire:

KaleabTessera avatar Apr 23 '22 15:04 KaleabTessera

@fchollet I finally got around to making a TPU-ImageNet-ResNet training work on Colab.

With a ResNet-50, each epoch takes ~ 20'. This will probably be much smaller for ResNet-18 and ResNet-34. So once this PR is merged, I can train them and provide the weights for a future PR.

However, I think my script could be much faster: I am not using steps_per_execution in the model.compile, because if I do the loss becomes nan... If you are still down to helping me set this up, I am interested.

zaccharieramzi avatar Apr 26 '22 13:04 zaccharieramzi

Thanks for the change. I was able to verify the weights with the model. There is one particular issue for preprocessing. In your sample code, it was using

tf.keras.applications.imagenet_utils.preprocess_input(
            image,  mode='torch')

which is different from tf.keras.applications.resnet.preprocess_input(image). If we use this, the acc reduced to less than 1%.

I think this will be a big issue since the preprocessing logic lives outside of model, and we can't easily warn user about what's the range of value we are expecting. I think we should consolidate all the weights into one format (and seems that all the resnet v1 weights are converted from caffe.)

qlzh727 avatar May 10 '22 20:05 qlzh727

@qlzh727 Indeed since I am porting from PyTorch I needed to use their preprocessing. I was not able to find the weights of the resnet34 in caffe, and the resnet18 weights appear to be only available here.

Here are my tentative answers:

  • Since anyway we wanted to retrain the models (cf this comment), it's only going to be a temporary issue. We can simply document it well, in particular in the model and preprocessing docs. There could by the way be a tf.keras.applications.resnet18.preprocess_input similarly to what exists for resnet50.
  • In the current state we could do the correction of preprocessing in the model, before retraining.

If however, you have at your disposal the caffe weights for both models (and by any chance the script to port them), I can definitely do the porting, and checks.

zaccharieramzi avatar May 11 '22 08:05 zaccharieramzi

I just found out something about the way torch applies batch norm at eval time that might explain the difference in accuracy I noticed here.

You can read about it here.

zaccharieramzi avatar May 14 '22 15:05 zaccharieramzi

Any progress on this? This would be really great to have!

KaleabTessera avatar Jun 30 '22 19:06 KaleabTessera

@zaccharieramzi Can you please resolve conflicts? Thank you!

gbaned avatar Jul 06 '22 13:07 gbaned

@gbaned should be done

zaccharieramzi avatar Jul 06 '22 14:07 zaccharieramzi

Sorry for the long wait, since end user could easily miss the preprocess API with pytorch format, how about we include the preprocess as part of the model, and control it via a include_preprocessing flag on the model. We have take this approach for several other models in the applications.

qlzh727 avatar Aug 08 '22 19:08 qlzh727

Sorry for the long wait, since end user could easily miss the preprocess API with pytorch format, how about we include the preprocess as part of the model, and control it via a include_preprocessing flag on the model. We have take this approach for several other models in the applications.

Due to the fact that the model requires a different preprocessing for inputs in the inputs between the ResNet18/34 and the other ResNets, we would probably need to re-train these weights. Let's migrate this to a PR on keras-cv. Please send a pull request to KerasCV, and place the model in the models package:

https://github.com/keras-team/keras-cv/tree/master/keras_cv/models

from there, we can retrain the models

LukeWood avatar Aug 25 '22 17:08 LukeWood