train & test
Hello and thank you for your effort I want to train and test a model using the embedded code But in both cases, it gives me the following error:
Can you guide me?
Thank you for your interest in our work. To resolve the error, please ensure you've correctly downloaded the Market-1501 dataset. Follow the instructions here to obtain the complete dataset. Once downloaded, verify that the 'bounding_box_train' folder contains 12936 .jpg files, 'bounding_box_test' contains 19732 .jpg files, and 'query' contains 3368 .jpg files.
Thank you for your quick and effective reply I fixed the previous problem, but now I have a problem with the model Thank you for your guidance
raise ValueError( ValueError: Layer count mismatch when loading weights from file. Model expected 0 layers, found 2 saved layers.
It might be a compatibility issue between your setup (Python 3.12 and likely the latest TensorFlow) and the code's original environment (TensorFlow 2.2.3 and Python 3.8). Here are two approaches to address this:
- Use the code's original environment. Refer to the instructions here. However, this approach might not be feasible for newer GPUs.
- Update the repository dependencies. Consider updating the repository to support the latest TensorFlow. You may train your own models as long as the final performance reflects the reported results.
thank you very much. I installed tensorflow 2.14 and python 3.10.12. The program has been executed up to the following amount But it also gave an error that I don't understand where the problem comes from I also think that the network parameters are a bit illogical, don't you think so?
And please advise about the error
Model: "training_model"
Layer (type) Output Shape Param # Connected to
input_11 (InputLayer) [(None, 384, 128, 3)] 0 []
inference_model (Functiona [(None, 2048), 7629920 ['input_11[0][0]',
l) (None, 1024), 0 'tf.image.flip_left_right[0][
(None, 1024)] 0]']
tf.image.flip_left_right ( (None, 384, 128, 3) 0 ['input_11[0][0]']
TFOpLambda)
tf.operators.add (TFOp (None, 2048) 0 ['inference_model[0][0]']
Lambda)
tf.operators.add_2 (TF (None, 1024) 0 ['inference_model[0][1]']
OpLambda)
tf.operators.add_4 (TF (None, 1024) 0 ['inference_model[0][2]']
OpLambda)
tf.operators.add_1 (TF (None, 2048) 0 ['tf.operators.add[0][0]', OpLambda) 'inference_model[1][0]']
tf.operators.add_3 (TF (None, 1024) 0 ['tf.operators.add_2[0][0]
OpLambda) ',
'inference_model[1][1]']
tf.operators.add_5 (TF (None, 1024) 0 ['tf.operators.add_4[0][0]
OpLambda) ',
'inference_model[1][2]']
tf.math.truediv_2 (TFOpLam (None, 2048) 0 ['tf.operators.add_1[0][0] bda) ']
tf.math.truediv_3 (TFOpLam (None, 1024) 0 ['tf.operators.add_3[0][0] bda) ']
tf.math.truediv_4 (TFOpLam (None, 1024) 0 ['tf.operators.add_5[0][0] bda) ']
classification_model (Func [(None, 751), 3092480 ['tf.math.truediv_2[0][0]',
tional) (None, 751), 'tf.math.truediv_3[0][0]',
(None, 751)] 'tf.math.truediv_4[0][0]']
tf.convert_to_tensor (TFOp (None, 2048) 0 ['inference_model[1][0]']
Lambda)
tf.cast (TFOpLambda) (None, 2048) 0 ['inference_model[0][0]']
tf.math.squared_difference (None, 2048) 0 ['tf.convert_to_tensor[0][0]', (TFOpLambda) 'tf.cast[0][0]']
tf.convert_to_tensor_1 (TF (None, 1024) 0 ['inference_model[1][1]']
OpLambda)
tf.cast_1 (TFOpLambda) (None, 1024) 0 ['inference_model[0][1]']
tf.math.reduce_mean_3 (TFO (None,) 0 ['tf.math.squared_difference[0 pLambda) ][0]']
tf.math.squared_difference (None, 1024) 0 ['tf.convert_to_tensor_1[0][0]
_1 (TFOpLambda) ',
'tf.cast_1[0][0]']
tf.convert_to_tensor_2 (TF (None, 1024) 0 ['inference_model[1][2]']
OpLambda)
tf.cast_2 (TFOpLambda) (None, 1024) 0 ['inference_model[0][2]']
tf.math.reduce_mean_4 (TFO () 0 ['tf.math.reduce_mean_3[0][0]' pLambda) ]
tf.math.reduce_mean_5 (TFO (None,) 0 ['tf.math.squared_difference_1 pLambda) [0][0]']
tf.math.squared_difference (None, 1024) 0 ['tf.convert_to_tensor_2[0][0]
_2 (TFOpLambda) ',
'tf.cast_2[0][0]']
tf.operators.add_6 (TF () 0 ['tf.math.reduce_mean_4[0][0]' OpLambda) ]
tf.math.reduce_mean_6 (TFO () 0 ['tf.math.reduce_mean_5[0][0]' pLambda) ]
tf.math.reduce_mean_7 (TFO (None,) 0 ['tf.math.squared_difference_2 pLambda) [0][0]']
tf.operators.add_7 (TF () 0 ['tf.operators.add_6[0][0]
OpLambda) ',
'tf.math.reduce_mean_6[0][0]'
]
tf.math.reduce_mean_8 (TFO () 0 ['tf.math.reduce_mean_7[0][0]' pLambda) ]
tf.operators.add_8 (TF () 0 ['tf.operators.add_7[0][0]
OpLambda) ',
'tf.math.reduce_mean_8[0][0]'
]
add_metric (AddMetric) () 0 ['tf.operators.add_8[0][0] ']
tf.math.multiply (TFOpLamb () 0 ['tf.operators.add_8[0][0] da) ']
add_loss (AddLoss) () 0 ['tf.math.multiply[0][0]']
================================================================================================== Total params: 79391680 (302.86 MB) Trainable params: 40835072 (155.77 MB) Non-trainable params: 38556608 (147.08 MB)
Summarizing inference_model_132322870669664 ... Model: "inference_model"
Layer (type) Output Shape Param # Connected to
input_1 (InputLayer) [(None, 384, 128, 3)] 0 []
preprocess_input (Function (None, 384, 128, 3) 0 ['input_1[0][0]']
al)
features/init_block (Funct (None, 96, 32, 64) 9664 ['preprocess_input[0][0]']
ional)
features/stage1 (Functiona (None, 96, 32, 256) 218624 ['features/init_block[0][0]'] l)
features/stage2 (Functiona (None, 48, 16, 512) 1226752 ['features/stage1[0][0]']
l)
features/stage3 (Functiona (None, 24, 8, 1024) 7118848 ['features/stage2[0][0]']
l)
features/stage4_regional_b (None, 24, 8, 2048) 1498726 ['features/stage3[0][0]']
ranch (Functional) 4
lambda_1 (Lambda) (None, 12, 8, 2048) 0 ['features/stage4_regional_bra nch[0][0]']
lambda_3 (Lambda) (None, 12, 8, 2048) 0 ['features/stage4_regional_bra nch[0][0]']
conv2d (Conv2D) (None, 12, 8, 1024) 1887539 ['lambda_1[0][0]']
2
conv2d_1 (Conv2D) (None, 12, 8, 1024) 1887539 ['lambda_3[0][0]']
2
features/stage4_global_bra (None, 24, 8, 2048) 1498726 ['features/stage3[0][0]']
nch (Functional) 4
activation (Activation) (None, 12, 8, 1024) 0 ['conv2d[0][0]']
activation_1 (Activation) (None, 12, 8, 1024) 0 ['conv2d_1[0][0]']
tf.math.maximum (TFOpLambd (None, 24, 8, 2048) 0 ['features/stage4_global_branc a) h[0][0]']
tf.math.maximum_1 (TFOpLam (None, 12, 8, 1024) 0 ['activation[0][0]']
bda)
tf.math.maximum_2 (TFOpLam (None, 12, 8, 1024) 0 ['activation_1[0][0]']
bda)
tf.math.pow (TFOpLambda) (None, 24, 8, 2048) 0 ['tf.math.maximum[0][0]']
tf.math.pow_2 (TFOpLambda) (None, 12, 8, 1024) 0 ['tf.math.maximum_1[0][0]']
tf.math.pow_4 (TFOpLambda) (None, 12, 8, 1024) 0 ['tf.math.maximum_2[0][0]']
tf.math.reduce_mean (TFOpL (None, 2048) 0 ['tf.math.pow[0][0]']
ambda)
tf.math.reduce_mean_1 (TFO (None, 1024) 0 ['tf.math.pow_2[0][0]']
pLambda)
tf.math.reduce_mean_2 (TFO (None, 1024) 0 ['tf.math.pow_4[0][0]']
pLambda)
tf.math.pow_1 (TFOpLambda) (None, 2048) 0 ['tf.math.reduce_mean[0][0]']
tf.math.pow_3 (TFOpLambda) (None, 1024) 0 ['tf.math.reduce_mean_1[0][0]' ]
tf.math.pow_5 (TFOpLambda) (None, 1024) 0 ['tf.math.reduce_mean_2[0][0]' ]
lambda (Lambda) (None, 2048) 0 ['tf.math.pow_1[0][0]']
lambda_2 (Lambda) (None, 1024) 0 ['tf.math.pow_3[0][0]']
lambda_4 (Lambda) (None, 1024) 0 ['tf.math.pow_5[0][0]']
================================================================================================== Total params: 76299200 (291.06 MB) Trainable params: 37750784 (144.01 MB) Non-trainable params: 38548416 (147.05 MB)
Summarizing preprocess_input_132323024335632 ... Model: "preprocess_input"
Layer (type) Output Shape Param #
input_2 (InputLayer) [(None, 384, 128, 3)] 0
tf.math.truediv (TFOpLambd (None, 384, 128, 3) 0
a)
tf.nn.bias_add (TFOpLambda (None, 384, 128, 3) 0
)
tf.math.truediv_1 (TFOpLam (None, 384, 128, 3) 0
bda)
================================================================= Total params: 0 (0.00 Byte) Trainable params: 0 (0.00 Byte) Non-trainable params: 0 (0.00 Byte)
Summarizing features/init_block_132323154296304 ... Model: "features/init_block"
Layer (type) Output Shape Param #
input_3 (InputLayer) [(None, 384, 128, 3)] 0
conv (ConvBlock) (None, 192, 64, 64) 9664
pool (MaxPool2d) (None, 96, 32, 64) 0
================================================================= Total params: 9664 (37.75 KB) Trainable params: 0 (0.00 Byte) Non-trainable params: 9664 (37.75 KB)
Summarizing features/stage1_132323035711712 ... Model: "features/stage1"
Layer (type) Output Shape Param #
input_4 (InputLayer) [(None, 96, 32, 64)] 0
stage1/unit_0_1 (ResUnit) (None, 96, 32, 256) 76288
stage1/unit_0_2 (ResUnit) (None, 96, 32, 256) 71168
stage1/unit_0_3 (ResUnit) (None, 96, 32, 256) 71168
================================================================= Total params: 218624 (854.00 KB) Trainable params: 0 (0.00 Byte) Non-trainable params: 218624 (854.00 KB)
Summarizing features/stage2_132323040862960 ... Model: "features/stage2"
Layer (type) Output Shape Param #
input_5 (InputLayer) [(None, 96, 32, 256)] 0
stage2/unit_1_1 (ResUnit) (None, 48, 16, 512) 381952
stage2/unit_1_2 (ResUnit) (None, 48, 16, 512) 281600
stage2/unit_1_3 (ResUnit) (None, 48, 16, 512) 281600
stage2/unit_1_4 (ResUnit) (None, 48, 16, 512) 281600
================================================================= Total params: 1226752 (4.68 MB) Trainable params: 0 (0.00 Byte) Non-trainable params: 1226752 (4.68 MB)
Summarizing features/stage3_132323037229984 ... Model: "features/stage3"
Layer (type) Output Shape Param #
input_6 (InputLayer) [(None, 48, 16, 512)] 0
stage3/unit_2_1 (ResUnit) (None, 24, 8, 1024) 1517568
stage3/unit_2_2 (ResUnit) (None, 24, 8, 1024) 1120256
stage3/unit_2_3 (ResUnit) (None, 24, 8, 1024) 1120256
stage3/unit_2_4 (ResUnit) (None, 24, 8, 1024) 1120256
stage3/unit_2_5 (ResUnit) (None, 24, 8, 1024) 1120256
stage3/unit_2_6 (ResUnit) (None, 24, 8, 1024) 1120256
================================================================= Total params: 7118848 (27.16 MB) Trainable params: 0 (0.00 Byte) Non-trainable params: 7118848 (27.16 MB)
Summarizing features/stage4_regional_branch_132323156030704 ... Model: "features/stage4_regional_branch"
Layer (type) Output Shape Param #
input_7 (InputLayer) [(None, 24, 8, 1024)] 0
unit_3_1 (ResUnit) (None, 24, 8, 2048) 6049792
unit_3_2 (ResUnit) (None, 24, 8, 2048) 4468736
unit_3_3 (ResUnit) (None, 24, 8, 2048) 4468736
================================================================= Total params: 14987264 (57.17 MB) Trainable params: 0 (0.00 Byte) Non-trainable params: 14987264 (57.17 MB)
Summarizing features/stage4_global_branch_132323034999696 ... Model: "features/stage4_global_branch"
Layer (type) Output Shape Param #
input_7 (InputLayer) [(None, 24, 8, 1024)] 0
unit_3_1 (ResUnit) (None, 24, 8, 2048) 6049792
unit_3_2 (ResUnit) (None, 24, 8, 2048) 4468736
unit_3_3 (ResUnit) (None, 24, 8, 2048) 4468736
================================================================= Total params: 14987264 (57.17 MB) Trainable params: 0 (0.00 Byte) Non-trainable params: 14987264 (57.17 MB)
Summarizing classification_model_132322843232480 ... Model: "classification_model"
Layer (type) Output Shape Param # Connected to
input_8 (InputLayer) [(None, 2048)] 0 []
input_9 (InputLayer) [(None, 1024)] 0 []
input_10 (InputLayer) [(None, 1024)] 0 []
batch_normalization (Batch (None, 2048) 8192 ['input_8[0][0]']
Normalization)
batch_normalization_1 (Bat (None, 1024) 4096 ['input_9[0][0]']
chNormalization)
batch_normalization_2 (Bat (None, 1024) 4096 ['input_10[0][0]']
chNormalization)
dense (Dense) (None, 751) 1538048 ['batch_normalization[0][0]']
dense_1 (Dense) (None, 751) 769024 ['batch_normalization_1[0][0]' ]
dense_2 (Dense) (None, 751) 769024 ['batch_normalization_2[0][0]' ]
activation_2 (Activation) (None, 751) 0 ['dense[0][0]']
activation_3 (Activation) (None, 751) 0 ['dense_1[0][0]']
activation_4 (Activation) (None, 751) 0 ['dense_2[0][0]']
================================================================================================== Total params: 3092480 (11.80 MB) Trainable params: 3084288 (11.77 MB) Non-trainable params: 8192 (32.00 KB)
Traceback (most recent call last):
File "/content/FlipReID/solution.py", line 925, in
-
AttributeError: 'Functional' object has no attribute '_layers'. Did you mean: 'layers'?is caused by the differences in TensorFlow. The provided code uses TensorFlow 2.2.3, rather than TensorFlow 2.14. Could you try using an environment the same as this? -
visualize_modelwould plot the model, and it is not essential. You may comment out this line and check whether the program runs afterward.
I did what you said It is running for the first epoch, but it is only running for the first epoch for about 1 hour and 40 minutes without any results Is it normal?
- 1 hour and 40 minutes sounds too long if steps_per_epoch is set to 200.
- I remember that the training procedure is very efficient, and the GPU utilization rate should be around 100% most of the time. You may check the output of
nvidia-smi.
I checked It doesn't seem to use gpu, although it can recognize that gpu is available. torch.cuda.is_available():True torch.device:cuda Initiating the image augmentor ... Perform training ... Freeze layers in the backbone model for 20 epochs.
Epoch 1: LearningRateScheduler setting learning rate to 2e-06. Epoch 1/20 WARNING:tensorflow:From /usr/local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:660: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead W0827 09:33:01.223699 136109813311296 deprecation.py:50] From /usr/local/lib/python3.10/site-packages/tensorflow/python/util/deprecation.py:660: calling map_fn_v2 (from tensorflow.python.ops.map_fn) with dtype is deprecated and will be removed in a future version. Instructions for updating: Use fn_output_signature instead 2024-08-27 09:33:13.213882: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 201326592 exceeds 10% of free system memory. 2024-08-27 09:33:13.590387: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 201326592 exceeds 10% of free system memory. 2024-08-27 09:33:14.429463: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 209780736 exceeds 10% of free system memory. 2024-08-27 09:33:14.468940: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 209780736 exceeds 10% of free system memory. 2024-08-27 09:33:14.759357: W tensorflow/tsl/framework/cpu_allocator_impl.cc:83] Allocation of 201326592 exceeds 10% of free system memory.
Can you provide compatibility in updated versions for the installed packages?
Due to current workload constraints, I'm unable to update the repository for the latest TensorFlow version at this time. The repository is provided in its current state. However, I'm committed to assisting you with any questions or issues you may encounter. Before starting experiments with FlipReID, please verify that your environment is set up correctly. A straightforward way to do this is to try a simpler example like MNIST. This will help confirm that your GPU is working and TensorFlow is installed properly. Once your environment is ready, you have two options:
- Use the recommended environment. The repository should work out of the box. Keep in mind that older TensorFlow versions may have compatibility limitations with newer GPUs due to dependencies like CUDA.
- Update the repository and use the latest TensorFlow. This shouldn't be too complicated. However, you might not be able to load the pre-trained weights I've provided. If this is the case, you can train the models from scratch.
Hello, dear engineer I came again with a new question I managed to run the code as far as traning A and traning B, but when saving the best model I get the following error:
Epoch 1: test_cosine_False_mAP_score improved from -inf to 0.03133, saving model to /content/FlipReID/output/Market1501_resnet50/training_model.h5
/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py:3103: UserWarning: You are saving your model as an HDF5 file via model.save(). This file format is considered legacy. We recommend using instead the native Keras format, e.g. model.save('my_model.keras').
saving_api.save_model(
Traceback (most recent call last):
File "/content/FlipReID/solution.py", line 925, in
- You can follow the instructions in the log and try
model.save('my_model.keras')instead ofmodel.save(). - You can search for
TypeSpec class <class 'tensorflow.python.ops.resource_variable_ops.VariableSpec'> has not been registeredin the TensorFlow repository. This issue is again due to the newer version of TensorFlow.