tf-keras icon indicating copy to clipboard operation
tf-keras copied to clipboard

layers.pop no longer removes the last layer

Open PolarBean opened this issue 2 years ago • 14 comments
trafficstars

Hi,

I am stuck on this issue where layers.pop() no longer removes the last layer. My workaround was to use _layers.pop but this has now been deprecated. I am aware that the new usage should be to get the output from Model.layers[-2] (or whichever layer you would like as the output) and define a new model with this as the output. The issue with this is it is now impossible to load weights saved by a model which had used the original .pop() method to remove the last n layers. Is there any workaround to get the old _layers.pop() behaviour back?

Thank you in advance

PolarBean avatar Feb 17 '23 15:02 PolarBean

The issue with this is it is now impossible to load weights saved by a model which

Will it work, skip_mismatch=True

Model.load_weights(filepath, by_name=False, skip_mismatch=False, options=None

https://keras.io/api/models/model_saving_apis/

innat avatar Feb 17 '23 19:02 innat

@PolarBean Please refer to the comment above and let us know? Thank you!

sushreebarsa avatar Feb 18 '23 13:02 sushreebarsa

by_name and skip_mismatch set to False does not work however both set to True does load the weights. See the two attached scripts for replication. scripts.zip

I will now test whether the model with these two options behaves as expected.

"""
@filename = 1_15ModelBuild.py
This script has been tested and works with tensorflow 1.15.0
"""

from tensorflow.keras.applications.xception import Xception
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input


import tensorflow as tf
base_model = Xception(include_top=True, weights='imagenet')
base_model._layers.pop()
base_model._layers.pop()
inputs = Input(shape=(299, 299, 3))
base_model_layer = base_model(inputs, training=True)
dense1_layer = Dense(256, activation="relu")(base_model_layer)
dense2_layer = Dense(256, activation="relu")(dense1_layer)
output_layer = Dense(9, activation="linear")(dense2_layer)
model = Model(inputs=inputs, outputs=output_layer)
model.save_weights('1_15_output_test.h5')


model.load_weights('1_15_output_test.h5')
print(f'successfully loaded weights with {tf.__version__}')


1_15ModelBuild.py output

successfully loaded weights with 1.15.0

"""
@filename = 2_10ModelLoad.py
This script has been tested and works with tensorflow 2.10.0
"""


from tensorflow.keras.applications.xception import Xception
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
import tensorflow as tf


#Since we cannot use the _layers API in tf 2.0 we instead index the third from last layer
base_model = Xception(include_top=True, weights='imagenet')
third_to_last_layer = base_model.layers[-3].output
base_model = Model(inputs=base_model.input, outputs=third_to_last_layer)


inputs = Input(shape=(299, 299, 3))
base_model_layer = base_model(inputs, training=True)

dense1_layer = Dense(256, activation="relu")(base_model_layer)
dense2_layer = Dense(256, activation="relu")(dense1_layer)
output_layer = Dense(9, activation="linear")(dense2_layer)

model = Model(inputs=inputs, outputs=output_layer)
# try every combination of options for loading weights in tf 2.0
skip_mismatch = [False, True]
by_name = [False, True]
for skip in skip_mismatch:
    for by in by_name:
        try:
            model.load_weights('1_15_output_test.h5', skip_mismatch=skip, by_name=by)
            print(f'successfully loaded weights with {tf.__version__} and skip_mismatch={skip} and by_name={by}')
        except:
            print(f'failed to load weights with {tf.__version__} and skip_mismatch={skip} and by_name={by}')


2_10ModelLoad.py output

failed to load weights with 2.10.0 and skip_mismatch=False and by_name=False
failed to load weights with 2.10.0 and skip_mismatch=False and by_name=True
failed to load weights with 2.10.0 and skip_mismatch=True and by_name=False
WARNING:tensorflow:Skipping loading weights for layer keras-team/keras#2 (named dense) due to mismatch in shape for weight dense/kernel:0. Weight expects shape (2048, 256). Received saved weight with shape (1000, 256)
successfully loaded weights with 2.10.0 and skip_mismatch=True and by_name=True

PolarBean avatar Feb 20 '23 11:02 PolarBean

But I doubt that it will work since the model summary created in 1.15 is

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         [(None, 299, 299, 3)]     0         
_________________________________________________________________
xception (Model)             (None, 1000)              20861480  
_________________________________________________________________
dense (Dense)                (None, 256)               256256    
_________________________________________________________________
dense_1 (Dense)              (None, 256)               65792     
_________________________________________________________________
dense_2 (Dense)              (None, 9)                 2313      
=================================================================
Total params: 21,185,841
Trainable params: 21,131,313
Non-trainable params: 54,528

And the model summary created in 2.10 is

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 299, 299, 3)]     0         
                                                                 
 model (Functional)          (None, 10, 10, 2048)      20861480  
                                                                 
 dense (Dense)               (None, 10, 10, 256)       524544    
                                                                 
 dense_1 (Dense)             (None, 10, 10, 256)       65792     
                                                                 
 dense_2 (Dense)             (None, 10, 10, 9)         2313      
                                                                 
=================================================================
Total params: 21,454,129
Trainable params: 21,399,601
Non-trainable params: 54,528




PolarBean avatar Feb 20 '23 11:02 PolarBean

So it appears in 1.15 _layers.pop() does not modify the model in place since the model.summary states the output size is 1000 (the original output size of the xception model)... However it clearly does since if i run model.layers[1].summary() the final few layers of the Xception model look like this

__________________________________________________________________________________________________
block14_sepconv2 (SeparableConv (None, 10, 10, 2048) 3159552     block14_sepconv1_act[0][0]       
__________________________________________________________________________________________________
block14_sepconv2_bn (BatchNorma (None, 10, 10, 2048) 8192        block14_sepconv2[0][0]           
__________________________________________________________________________________________________
block14_sepconv2_act (Activatio (None, 10, 10, 2048) 0           block14_sepconv2_bn[0][0]        
==================================================================================================
Total params: 20,861,480
Trainable params: 20,806,952
Non-trainable params: 54,528
_____________________________

How then is the shape of the next layer (None, 1000)??

PolarBean avatar Feb 20 '23 11:02 PolarBean

@PolarBean Are you sure it's correct? tf.keras is happeded after tf 2.0.

"""
@filename = 1_15ModelBuild.py
This script has been tested and works with tensorflow 1.15.0
"""

from tensorflow.keras.applications.xception import Xception
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input

Apart from it, about ._layer.pop(), please check. https://github.com/keras-team/keras/issues/15542 cc. @qlzh727

innat avatar Feb 20 '23 15:02 innat

image https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras Seems that tf.keras is indeed part of 1.15

But yes it seems that you have had a similar issue. Hopefully there is a workaround :)

PolarBean avatar Feb 20 '23 15:02 PolarBean

Any help with this would be amazing as this issue prevents me from releasing my package on PyPi

PolarBean avatar Feb 21 '23 08:02 PolarBean

I think overall, layers.pop will not be supported on functional models. As generally speaking a functional model is a directed graph, a pop function is not really well defined. Sounds like the issue here is really finding a workaround for load_weights() from a older "popped layers" model to a newer "sliced" functional model. Is that right?

I would highly suspect there is the way to either modify the weights to fit your needs here, or run your own loading code shim for these old weights. But we probably don't have the bandwidth on the Keras team right now to poke around with what that would need to look like. This may require digging into the guts of Keras load_weights call and seeing where things go south.

Anyone in the community who has insights please chime in here!

mattdangerw avatar Mar 17 '23 19:03 mattdangerw

Hi PolarBean, I was actually using your DeepSlice repository when I came across this error. Im struggling to find a workaround for base_model._layers.pop() as well. Did you find a way of brute-forcing your way into using the Xception weights? Thanks in advance, Riley

rdaggs avatar Feb 22 '24 23:02 rdaggs

@PolarBean I'm not sure if this is still useful to you or anybody else, but I was running into this issue recently, using Python 3.11.5 and Tensorflow 2.13.1. I think that I've found the solution to circumvent the bug that should be compatible with Tensorflow 2.xx.

The base_model._layers.pop() call in the original code only cosmetically removes the last two layers in the Xception model, but during the actual execution of the graph, these last two layers (the AveragePooling and Softmax layers) are still being performed. Thus, the output shape of the Xception layer in the overall DSModel.model is still (1000,) and correctly interfaces with the rest of the model. However, if you try to access base_model.layers or even base_model.summary(), these last two layers are hidden and nowhere to be seen.

My workaround is, instead of using model.load_weights(), we must manually set each of the layers using model.layers[idx].set_weights(list_of_numpy_weights) after loading the weights in with h5py. I wrote this function to be included in the neural_network.py module. It should be called whenever model.load_weights() was being called:

def load_xception_weights(model, weights):
    with h5py.File(weights, "r") as new:
        # set weight of each layer manually
        model.layers[1].set_weights([new["dense"]["dense"]["kernel:0"], new["dense"]["dense"]["bias:0"]])
        model.layers[2].set_weights([new["dense_1"]["dense_1"]["kernel:0"], new["dense_1"]["dense_1"]["bias:0"]])
        model.layers[3].set_weights([new["dense_2"]["dense_2"]["kernel:0"], new["dense_2"]["dense_2"]["bias:0"]])

        # Set the weights of the xception model 
        weight_names = new["xception"].attrs["weight_names"].tolist()
        weight_names_layers = [name.decode("utf-8").split("/")[0] for name in weight_names]

        for i in range(len(model.layers[0].layers)):
            name_of_layer = model.layers[0].layers[i].name
            # if layer name is in the weight names, then we will set weights
            if name_of_layer in weight_names_layers:
                # Get name of weights in the layer
                layer_weight_names = []
                for weight in model.layers[0].layers[i].weights:
                    layer_weight_names.append(weight.name.split("/")[1])
                h5_group = new["xception"][name_of_layer]
                weights_list = [np.array(h5_group[kk]) for kk in layer_weight_names]
                model.layers[0].layers[i].set_weights(weights_list)
    return model

wjguan avatar May 23 '24 17:05 wjguan

Oh wow thats amazing. Are you using this fix with DeepSlice? If so perhaps you could open an issue on DeepSlice and we can discuss opening a pull request? Thank you!

PolarBean avatar May 23 '24 18:05 PolarBean

@PolarBean Yup, I tried it and got the same results between tensorflow 1.15 and 2.13! I'll open an issue on DeepSlice

wjguan avatar May 23 '24 18:05 wjguan

On functional models, layers.pop is not supported due to their directed graph nature, making the concept of "popping" ambiguous. The core challenge seems to be loading weights from an older model with "popped" layers into a newer functional model with a "sliced" architecture. While the Keras team's current bandwidth is limited and more on the Keras3.0. These could involve adapting the weights to the new structure or implementing custom loading logic. This might necessitate examining Keras' load_weights function internally to understand the source of incompatibility.

tilakrayal avatar May 16 '25 05:05 tilakrayal