tf-keras
tf-keras copied to clipboard
layers.pop no longer removes the last layer
Hi,
I am stuck on this issue where layers.pop() no longer removes the last layer. My workaround was to use _layers.pop but this has now been deprecated. I am aware that the new usage should be to get the output from Model.layers[-2] (or whichever layer you would like as the output) and define a new model with this as the output. The issue with this is it is now impossible to load weights saved by a model which had used the original .pop() method to remove the last n layers. Is there any workaround to get the old _layers.pop() behaviour back?
Thank you in advance
The issue with this is it is now impossible to load weights saved by a model which
Will it work, skip_mismatch=True
Model.load_weights(filepath, by_name=False, skip_mismatch=False, options=None
https://keras.io/api/models/model_saving_apis/
@PolarBean Please refer to the comment above and let us know? Thank you!
by_name and skip_mismatch set to False does not work however both set to True does load the weights. See the two attached scripts for replication. scripts.zip
I will now test whether the model with these two options behaves as expected.
"""
@filename = 1_15ModelBuild.py
This script has been tested and works with tensorflow 1.15.0
"""
from tensorflow.keras.applications.xception import Xception
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
import tensorflow as tf
base_model = Xception(include_top=True, weights='imagenet')
base_model._layers.pop()
base_model._layers.pop()
inputs = Input(shape=(299, 299, 3))
base_model_layer = base_model(inputs, training=True)
dense1_layer = Dense(256, activation="relu")(base_model_layer)
dense2_layer = Dense(256, activation="relu")(dense1_layer)
output_layer = Dense(9, activation="linear")(dense2_layer)
model = Model(inputs=inputs, outputs=output_layer)
model.save_weights('1_15_output_test.h5')
model.load_weights('1_15_output_test.h5')
print(f'successfully loaded weights with {tf.__version__}')
1_15ModelBuild.py output
successfully loaded weights with 1.15.0
"""
@filename = 2_10ModelLoad.py
This script has been tested and works with tensorflow 2.10.0
"""
from tensorflow.keras.applications.xception import Xception
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
import tensorflow as tf
#Since we cannot use the _layers API in tf 2.0 we instead index the third from last layer
base_model = Xception(include_top=True, weights='imagenet')
third_to_last_layer = base_model.layers[-3].output
base_model = Model(inputs=base_model.input, outputs=third_to_last_layer)
inputs = Input(shape=(299, 299, 3))
base_model_layer = base_model(inputs, training=True)
dense1_layer = Dense(256, activation="relu")(base_model_layer)
dense2_layer = Dense(256, activation="relu")(dense1_layer)
output_layer = Dense(9, activation="linear")(dense2_layer)
model = Model(inputs=inputs, outputs=output_layer)
# try every combination of options for loading weights in tf 2.0
skip_mismatch = [False, True]
by_name = [False, True]
for skip in skip_mismatch:
for by in by_name:
try:
model.load_weights('1_15_output_test.h5', skip_mismatch=skip, by_name=by)
print(f'successfully loaded weights with {tf.__version__} and skip_mismatch={skip} and by_name={by}')
except:
print(f'failed to load weights with {tf.__version__} and skip_mismatch={skip} and by_name={by}')
2_10ModelLoad.py output
failed to load weights with 2.10.0 and skip_mismatch=False and by_name=False
failed to load weights with 2.10.0 and skip_mismatch=False and by_name=True
failed to load weights with 2.10.0 and skip_mismatch=True and by_name=False
WARNING:tensorflow:Skipping loading weights for layer keras-team/keras#2 (named dense) due to mismatch in shape for weight dense/kernel:0. Weight expects shape (2048, 256). Received saved weight with shape (1000, 256)
successfully loaded weights with 2.10.0 and skip_mismatch=True and by_name=True
But I doubt that it will work since the model summary created in 1.15 is
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 299, 299, 3)] 0
_________________________________________________________________
xception (Model) (None, 1000) 20861480
_________________________________________________________________
dense (Dense) (None, 256) 256256
_________________________________________________________________
dense_1 (Dense) (None, 256) 65792
_________________________________________________________________
dense_2 (Dense) (None, 9) 2313
=================================================================
Total params: 21,185,841
Trainable params: 21,131,313
Non-trainable params: 54,528
And the model summary created in 2.10 is
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 299, 299, 3)] 0
model (Functional) (None, 10, 10, 2048) 20861480
dense (Dense) (None, 10, 10, 256) 524544
dense_1 (Dense) (None, 10, 10, 256) 65792
dense_2 (Dense) (None, 10, 10, 9) 2313
=================================================================
Total params: 21,454,129
Trainable params: 21,399,601
Non-trainable params: 54,528
So it appears in 1.15 _layers.pop() does not modify the model in place since the model.summary states the output size is 1000 (the original output size of the xception model)... However it clearly does since if i run model.layers[1].summary() the final few layers of the Xception model look like this
__________________________________________________________________________________________________
block14_sepconv2 (SeparableConv (None, 10, 10, 2048) 3159552 block14_sepconv1_act[0][0]
__________________________________________________________________________________________________
block14_sepconv2_bn (BatchNorma (None, 10, 10, 2048) 8192 block14_sepconv2[0][0]
__________________________________________________________________________________________________
block14_sepconv2_act (Activatio (None, 10, 10, 2048) 0 block14_sepconv2_bn[0][0]
==================================================================================================
Total params: 20,861,480
Trainable params: 20,806,952
Non-trainable params: 54,528
_____________________________
How then is the shape of the next layer (None, 1000)??
@PolarBean
Are you sure it's correct? tf.keras is happeded after tf 2.0.
"""
@filename = 1_15ModelBuild.py
This script has been tested and works with tensorflow 1.15.0
"""
from tensorflow.keras.applications.xception import Xception
from tensorflow.keras import Model
from tensorflow.keras.layers import Dense, Input
Apart from it, about ._layer.pop(), please check. https://github.com/keras-team/keras/issues/15542 cc. @qlzh727
https://www.tensorflow.org/versions/r1.15/api_docs/python/tf/keras
Seems that tf.keras is indeed part of 1.15
But yes it seems that you have had a similar issue. Hopefully there is a workaround :)
Any help with this would be amazing as this issue prevents me from releasing my package on PyPi
I think overall, layers.pop will not be supported on functional models. As generally speaking a functional model is a directed graph, a pop function is not really well defined. Sounds like the issue here is really finding a workaround for load_weights() from a older "popped layers" model to a newer "sliced" functional model. Is that right?
I would highly suspect there is the way to either modify the weights to fit your needs here, or run your own loading code shim for these old weights. But we probably don't have the bandwidth on the Keras team right now to poke around with what that would need to look like. This may require digging into the guts of Keras load_weights call and seeing where things go south.
Anyone in the community who has insights please chime in here!
Hi PolarBean, I was actually using your DeepSlice repository when I came across this error. Im struggling to find a workaround for base_model._layers.pop() as well. Did you find a way of brute-forcing your way into using the Xception weights? Thanks in advance, Riley
@PolarBean I'm not sure if this is still useful to you or anybody else, but I was running into this issue recently, using Python 3.11.5 and Tensorflow 2.13.1. I think that I've found the solution to circumvent the bug that should be compatible with Tensorflow 2.xx.
The base_model._layers.pop() call in the original code only cosmetically removes the last two layers in the Xception model, but during the actual execution of the graph, these last two layers (the AveragePooling and Softmax layers) are still being performed. Thus, the output shape of the Xception layer in the overall DSModel.model is still (1000,) and correctly interfaces with the rest of the model. However, if you try to access base_model.layers or even base_model.summary(), these last two layers are hidden and nowhere to be seen.
My workaround is, instead of using model.load_weights(), we must manually set each of the layers using model.layers[idx].set_weights(list_of_numpy_weights) after loading the weights in with h5py. I wrote this function to be included in the neural_network.py module. It should be called whenever model.load_weights() was being called:
def load_xception_weights(model, weights):
with h5py.File(weights, "r") as new:
# set weight of each layer manually
model.layers[1].set_weights([new["dense"]["dense"]["kernel:0"], new["dense"]["dense"]["bias:0"]])
model.layers[2].set_weights([new["dense_1"]["dense_1"]["kernel:0"], new["dense_1"]["dense_1"]["bias:0"]])
model.layers[3].set_weights([new["dense_2"]["dense_2"]["kernel:0"], new["dense_2"]["dense_2"]["bias:0"]])
# Set the weights of the xception model
weight_names = new["xception"].attrs["weight_names"].tolist()
weight_names_layers = [name.decode("utf-8").split("/")[0] for name in weight_names]
for i in range(len(model.layers[0].layers)):
name_of_layer = model.layers[0].layers[i].name
# if layer name is in the weight names, then we will set weights
if name_of_layer in weight_names_layers:
# Get name of weights in the layer
layer_weight_names = []
for weight in model.layers[0].layers[i].weights:
layer_weight_names.append(weight.name.split("/")[1])
h5_group = new["xception"][name_of_layer]
weights_list = [np.array(h5_group[kk]) for kk in layer_weight_names]
model.layers[0].layers[i].set_weights(weights_list)
return model
Oh wow thats amazing. Are you using this fix with DeepSlice? If so perhaps you could open an issue on DeepSlice and we can discuss opening a pull request? Thank you!
@PolarBean Yup, I tried it and got the same results between tensorflow 1.15 and 2.13! I'll open an issue on DeepSlice
On functional models, layers.pop is not supported due to their directed graph nature, making the concept of "popping" ambiguous. The core challenge seems to be loading weights from an older model with "popped" layers into a newer functional model with a "sliced" architecture. While the Keras team's current bandwidth is limited and more on the Keras3.0. These could involve adapting the weights to the new structure or implementing custom loading logic. This might necessitate examining Keras' load_weights function internally to understand the source of incompatibility.