mmdetection3d
mmdetection3d copied to clipboard
The model and loaded state dict do not match exactly
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([4, 16, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 4]).
Which model and checkpoint do you use? We would check and update it ASAP.
Which model and checkpoint do you use? We would check and update it ASAP.
model:second, checkpoint:configs/second/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class.py
python demo/pcd_demo.py demo/data/kitti/kitti_000000.bin configs/second/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class.py checkpoints/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059-05f67bdf.pth --show
What version of mmdet3d are you using? Please follow the template for implementation to describe your issue such that we can have more information to reproduce your problem and locate it.
What version of mmdet3d are you using? Please follow the template for implementation to describe your issue such that we can have more information to reproduce your problem and locate it.
I also have the same problem where it won't load. The version of mmdet3d is '1.0.0rc2'. This could be solved by using torch.transpose to fit the shape when loading the checkpoints.
Another problem that the evaluation performance of downloaded checkpoints is too low. Downloaded checkpoints are loaded without conversion, but performance is too low.
python tools/test.py configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py ckpt/hv_second_secfpn_6x8_80e_kitti-3d-car_20200620_230238-393f000c.pth --eval mAP
----------- AP11 Results ------------
Car [email protected], 0.70, 0.70:
bbox AP11:9.9299, 10.3669, 10.6121
bev AP11:0.0433, 0.1324, 0.1343
3d AP11:0.0366, 0.0930, 0.0924
aos AP11:3.23, 4.70, 4.81
Car [email protected], 0.50, 0.50:
bbox AP11:9.9299, 10.3669, 10.6121
bev AP11:0.1352, 0.3300, 0.3601
3d AP11:0.1266, 0.3040, 0.2868
aos AP11:3.23, 4.70, 4.81
----------- AP40 Results ------------
Car [email protected], 0.70, 0.70:
bbox AP40:1.3368, 1.9814, 2.3379
bev AP40:0.0119, 0.0364, 0.0369
3d AP40:0.0101, 0.0256, 0.0254
aos AP40:0.51, 0.75, 0.90
Car [email protected], 0.50, 0.50:
bbox AP40:1.3368, 1.9814, 2.3379
bev AP40:0.0743, 0.2669, 0.2871
3d AP40:0.0680, 0.2434, 0.1577
aos AP40:0.51, 0.75, 0.90
We will update the new checkpoints soon, please stay tuned.
Which model and checkpoint do you use? We would check and update it ASAP.
model:second, checkpoint:configs/second/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class.py
python demo/pcd_demo.py demo/data/kitti/kitti_000000.bin configs/second/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class.py checkpoints/hv_second_secfpn_fp16_6x8_80e_kitti-3d-3class_20200925_110059-05f67bdf.pth --show
I recommend you to use hv_second_secfpn_6x8_80e_kitti-3d-3class.py first for quick use, the pretrained model has been updated.
I use a modified configs/second/hv_second_secfpn_6x8_80e_kitti-3d-car.py for training the model on my customer dataset. I also encountered this ”shape mismatch“ problem when I load the *.pth for evaluation. @Tai-Wang
The model and loaded state dict do not match exactly
size mismatch for middle_encoder.conv_input.0.weight: copying a param with shape ('middle_encoder.conv_input.0.weight', torch.Size([4, 16, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 4]).
size mismatch for middle_encoder.encoder_layers.encoder_layer1.0.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer1.0.0.weight', torch.Size([16, 16, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([16, 3, 3, 3, 16]).
size mismatch for middle_encoder.encoder_layers.encoder_layer2.0.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer2.0.0.weight', torch.Size([16, 32, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([32, 3, 3, 3, 16]).
size mismatch for middle_encoder.encoder_layers.encoder_layer2.1.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer2.1.0.weight', torch.Size([32, 32, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([32, 3, 3, 3, 32]).
size mismatch for middle_encoder.encoder_layers.encoder_layer2.2.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer2.2.0.weight', torch.Size([32, 32, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([32, 3, 3, 3, 32]).
size mismatch for middle_encoder.encoder_layers.encoder_layer3.0.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer3.0.0.weight', torch.Size([32, 64, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([64, 3, 3, 3, 32]).
size mismatch for middle_encoder.encoder_layers.encoder_layer3.1.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer3.1.0.weight', torch.Size([64, 64, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([64, 3, 3, 3, 64]).
size mismatch for middle_encoder.encoder_layers.encoder_layer3.2.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer3.2.0.weight', torch.Size([64, 64, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([64, 3, 3, 3, 64]).
size mismatch for middle_encoder.encoder_layers.encoder_layer4.0.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer4.0.0.weight', torch.Size([64, 64, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([64, 3, 3, 3, 64]).
size mismatch for middle_encoder.encoder_layers.encoder_layer4.1.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer4.1.0.weight', torch.Size([64, 64, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([64, 3, 3, 3, 64]).
size mismatch for middle_encoder.encoder_layers.encoder_layer4.2.0.weight: copying a param with shape ('middle_encoder.encoder_layers.encoder_layer4.2.0.weight', torch.Size([64, 64, 3, 3, 3])) from checkpoint,the shape in current model is torch.Size([64, 3, 3, 3, 64]).
size mismatch for middle_encoder.conv_out.0.weight: copying a param with shape ('middle_encoder.conv_out.0.weight', torch.Size([64, 128, 3, 1, 1])) from checkpoint,the shape in current model is torch.Size([128, 3, 1, 1, 64]).
2022-07-15 14:03:26,484 - mmdet - INFO - Environment info:
------------------------------------------------------------
sys.platform: linux
Python: 3.6.13 |Anaconda, Inc.| (default, Jun 4 2021, 14:25:59) [GCC 7.5.0]
CUDA available: True
GPU 0: NVIDIA GeForce RTX 2080
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 10.2, V10.2.8
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
PyTorch: 1.10.0+cu102
PyTorch compiling details: PyTorch built with:
- GCC 7.3
- C++ Version: 201402
- Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
- Intel(R) MKL-DNN v2.2.3 (Git Hash 7336ca9f055cf1bfa13efb658fe15dc9b41f0740)
- OpenMP 201511 (a.k.a. OpenMP 4.5)
- LAPACK is enabled (usually provided by MKL)
- NNPACK is enabled
- CPU capability usage: AVX2
- CUDA Runtime 10.2
- NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70
- CuDNN 7.6.5
- Magma 2.5.2
- Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=10.2, CUDNN_VERSION=7.6.5, CXX_COMPILER=/opt/rh/devtoolset-7/root/usr/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.10.0, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON,
TorchVision: 0.11.0+cu102
OpenCV: 4.6.0
MMCV: 1.5.3
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 10.2
MMDetection: 2.25.0
MMSegmentation: 0.26.0
MMDetection3D: 1.0.0rc3+
spconv2.0: True
------------------------------------------------------------
yes, I meet this issue too, I found the dismatch is that the dimension"s order is different
I am also having the same issue with re-training the models SECOND and MVXNET and then running the test.py script.
I am also having the same issue with re-training the models
SECONDandMVXNETand then running the test.py script.
do u find the solution... I am trying transpose the dim in the middle layer...
This could be solved by using torch.transpose to fit the shape when loading the checkpoints.
could u please tell me the detailed solutions? thank u!
I have found a solution, just need to read the weight tensor matrix of the corresponding position in the .pth file and perform the permutate operation to convert it to the corresponding size.
I have found a solution, just need to read the weight tensor matrix of the corresponding position in the .pth file and perform the permutate operation to convert it to the corresponding size.
@achao-c , please could you share the solution specific to the file and the location where to modify it? I will be much appreciated!
I use the follow code to modification the model.pth and solve the problem
import torch
path = 'xxxxx\epoch_40.pth' model = torch.load(path)
model['state_dict']['middle_encoder.conv_input.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_input.0.weight'],0,1) model['state_dict']['middle_encoder.conv_input.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_input.0.weight'],1,2) model['state_dict']['middle_encoder.conv_input.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_input.0.weight'],2,3) model['state_dict']['middle_encoder.conv_input.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_input.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer1.0.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.0.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.1.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer2.2.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.0.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.1.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer3.2.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.0.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.1.0.weight'],3,4)
model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'],0,1) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'],1,2) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'],2,3) model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.encoder_layers.encoder_layer4.2.0.weight'],3,4)
model['state_dict']['middle_encoder.conv_out.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_out.0.weight'],0,1) model['state_dict']['middle_encoder.conv_out.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_out.0.weight'],1,2) model['state_dict']['middle_encoder.conv_out.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_out.0.weight'],2,3) model['state_dict']['middle_encoder.conv_out.0.weight'] = torch.transpose(model['state_dict']['middle_encoder.conv_out.0.weight'],3,4)
torch.save(model, "xxxxx.pth")
I have found a solution, just need to read the weight tensor matrix of the corresponding position in the .pth file and perform the permutate operation to convert it to the corresponding size.
@achao-c , please could you share the solution specific to the file and the location where to modify it? I will be much appreciated!
you can see my solution
@weishida01 many thanks for sharing your solution. Just curious about one thing, instead of torch.transpose could torch.permute() work as well! Did you have a try with it?
yes, torch.permute() is a better solution
Has any more elegant solutions? It's annoying to convert weights manually every time.
I trained CenterPoint from scratch, and also meet the same problem.
This error seems related to SparseEncoder. Is it replemented?
Has any more elegant solutions? It's annoying to convert weights manually every time. I trained CenterPoint from scratch, and also meet the same problem. This error seems related to
SparseEncoder. Is it replemented?
@JoeyforJoy, well it has to be fixed in the mmcv while loading or saving the checkpoints. The workaround that I have created is a function that checks for the specific model in this case "VoxelNet" and then performs the middle. layer shifting to work. It can be done inside the load_checkpoint function in workspace/mmcv/mmcv/runner/checkpoint.py file.
Has any more elegant solutions? It's annoying to convert weights manually every time. I trained CenterPoint from scratch, and also meet the same problem. This error seems related to
SparseEncoder. Is it replemented?@JoeyforJoy, well it has to be fixed in the mmcv while loading or saving the checkpoints. The workaround that I have created is a function that checks for the specific model in this case "VoxelNet" and then performs the middle. layer shifting to work. It can be done inside the
load_checkpointfunction inworkspace/mmcv/mmcv/runner/checkpoint.pyfile.
Well, this is a good expedient which can load right checkpoint automatically. The best way seems to reconstruct the ops in mmcv. If my memory serves me right, this problem occurred after mmdet3d moved ops to mmcv. spconv updated to spconv 2.0. Related version may need to be checked.
Yes, you are right. After the update to support for spconv 2.0 this issue has surfaced.
hi! someone know how to create my own config file?
您好,您的邮件已经收到,我会尽快给您回复。