Keras-NASNet icon indicating copy to clipboard operation
Keras-NASNet copied to clipboard

how to load pretrain model?

Open zuoxiang95 opened this issue 6 years ago • 8 comments

hello @titu1994 , I am using your code to train my dataset, and i want to train it with a pretrain model that you provide in nasnet.py. But the problem is that my category is 361, and the pre-trained model is 1000, how do I modify it? Looking forward for your reply! : )

zuoxiang95 avatar Jun 14 '18 11:06 zuoxiang95

I build a model to load the pretrain model's weight as this: model = NASNetLarge((img_rows, img_cols, img_channels), use_auxiliary_branch=True, include_top=True)

but i get this error:

Traceback (most recent call last): File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1567, in _create_c_op c_op = c_api.TF_FinishOperation(op_desc) tensorflow.python.framework.errors_impl.InvalidArgumentError: Dimension 0 in both shapes must be equal, but are 1 and 128. Shapes are [1,1,2688,128] and [128,2016,1,1]. for 'Assign_1524' (op: 'Assign') with input shapes: [1,1,2688,128], [128,2016,1,1].

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "train.py", line 70, in model = NASNetLarge((img_rows, img_cols, img_channels), use_auxiliary_branch=True, include_top=True) File "/home/zuoxiang/Keras-NASNet/nasnet.py", line 407, in NASNetLarge default_size=331) File "/home/zuoxiang/Keras-NASNet/nasnet.py", line 320, in NASNet model.load_weights(weights_file) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/network.py", line 1180, in load_weights f, self.layers, reshape=reshape) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/engine/saving.py", line 929, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2430, in batch_set_value assign_op = x.assign(assign_placeholder) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/variables.py", line 615, in assign return state_ops.assign(self._variable, value, use_locking=use_locking) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/state_ops.py", line 283, in assign validate_shape=validate_shape) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/ops/gen_state_ops.py", line 60, in assign use_locking=use_locking, name=name) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3392, in create_op op_def=op_def) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1734, in init control_input_ops) File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1570, in _create_c_op raise ValueError(str(e)) ValueError: Dimension 0 in both shapes must be equal, but are 1 and 128. Shapes are [1,1,2688,128] and [128,2016,1,1]. for 'Assign_1524' (op: 'Assign') with input shapes: [1,1,2688,128], [128,2016,1,1].

Do you know what's wrong? Thank you very much!

zuoxiang95 avatar Jun 14 '18 13:06 zuoxiang95

You must have used an odd input shape here. Can you provide the full script with all variables ?

titu1994 avatar Jun 14 '18 15:06 titu1994

here is my all variables:

    lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.5), cooldown=0, patience=5, min_lr=0.5e-5)
    csv_logger = CSVLogger('NASNet-objction-classfication.csv')
    model_checkpoint = ModelCheckpoint(weights_file, monitor='val_predictions_acc', save_best_only=True, save_weights_only=True, mode='max')
    batch_size = 128
    nb_classes = 361
    nb_epoch = 200  # should be 600
    data_augmentation = True

    # input image dimensions
    img_rows, img_cols = 331, 331
    img_channels = 3

zuoxiang95 avatar Jun 15 '18 02:06 zuoxiang95

when i set use_auxiliary_branch=False, include_top=False and add code in my script. The model can be trained successfully. But another problem is when i can only set batch size to 16,otherwise it will OOM. My machine is P40.

    base_model = NASNetLarge((img_rows, img_cols, img_channels), use_auxiliary_branch=False, include_top=False)
    x = base_model.output
    x = GlobalAveragePooling2D()(x)
    x = Dropout(dropout)(x)
    predictions = Dense(nb_classes, activation='softmax', kernel_regularizer=l2(weight_decay), name='predictions')(x)
    model = Model(inputs=base_model.input, outputs=predictions)
    model.summary()

zuoxiang95 avatar Jun 15 '18 02:06 zuoxiang95

Weird. I don't get this error. I an using TF with Channs last data format. I'm guessing you are having the same, so I don't understand the cause.

titu1994 avatar Jun 15 '18 03:06 titu1994

Yes, I am using the generator function in imagenet_validation.py.

zuoxiang95 avatar Jun 15 '18 03:06 zuoxiang95

hello @titu1994 , How big is your model's batch size when you trained large nasnet?

zuoxiang95 avatar Jul 13 '18 10:07 zuoxiang95

@titu1994 Same Issue Here, I think there is a problem while loading auxiliary_brach weights..

Sbakkalii avatar Dec 08 '19 18:12 Sbakkalii