trafficstars

AssertionError: channel size mismatch

Chinese

当我运行 Code1 部分提供的 Python 命令时, 出现了这样一个错误: AssertionError: channel size mismatch

当我试图调试它时, 我发现它和 modules 部分中的 heter_encoders.py 中的 SECOND 类, 中的 spconv_block 层有关. 这个类是 VoxelBackBone8x 类的实例, 初始化时, input_channels 通过读取配置文件中的 args['spconv']['num_features_in'] 来获取. 我查看了一下 m3 阶段对应的通道数为 64.

然而当我在 spare_backbone_3d 的 forward 函数中将即将输入conv_input 层的 input_sp_tensor 变量打印出来时, 得到的确是 torch.Size([168423, 4]) 然而正如前面提到的, conv_input 期待的输入通道数是 64 而并非是 4.

我不知道发生了什么, 我也仔细看了一下配置文件, 并没有发现什么错误. 我该怎么解决呢? (我已经发现了一个解决办法就是将配置文件中 model/m3/encoder_args/num_features_in 的值修改为 4, 虽然能正常跑起来, 但是我不确定这样做会不会影响训练的精度, 以及训练过程中会不会发生什么意外错误导致训练无法正常进行).

English

When I ran the Python command provided in the Code1 section, I got an error like this: AssertionError: channel size mismatch

When I tried to debug it, I found that it was related to the spconv_block layer in the SECOND class in heter_encoders.py in the modules section. This class is an instance of the VoxelBackBone8x class. When initialized, input_channels is obtained by reading args['spconv']['num_features_in'] in the configuration file. I checked that the number of channels corresponding to the m3 stage is 64.

However, when I printed out the input_sp_tensor variable that will be input to the conv_input layer in the forward function of spare_backbone_3d, I did get torch.Size([168423, 4]) However, as mentioned earlier, the number of input channels expected by conv_input is 64 and not 4.

I don’t know what happened. I also looked at the configuration file carefully and didn’t find any errors. How can I solve it? (I have found a solution, which is to change the value of model/m3/encoder_args/num_features_in in the configuration file to 4. Although it can run normally, I am not sure whether this will affect the accuracy of training, and whether there will be any unexpected errors during training that will prevent the training from proceeding normally).

Code

Code1

python opencood/tools/inference.py --model_dir opencood/logs/HEAL_m1_based/stage2/m3_alignto_m1 --fusion_method intermediate

Figure

Figure 1 调试语句 (Debug statements)

Figure 2 调试信息输出 (Debug information output)

Figure 3 配置文件修改 (Configuration file modification)

Jul 10 '24 06:07 Chinese-Coding

Hi, 这个应该是spconv 2.x 和 spconv 1.2.1之间的差异导致的，之前也有人在issue里提到

Jul 10 '24 12:07 yifanlu0227

https://github.com/DerrickXuNu/OpenCOOD/commit/d0c64ad1a22ae246a39acb998f809eb1967ddd8b

我根据Runsheng的这个commit修改过code base

但好像确实4没有问题

https://github.com/DerrickXuNu/OpenCOOD/blob/54e70314272b43ebd3e1c7c21ef01128ba670bde/opencood/models/second_intermediate.py#L23-L24

我再测试一下回复你

Jul 10 '24 12:07 yifanlu0227

Screenshot 2024-07-11 at 00 08 38

使用spconv1.2.1的时候，voxel_feature的shape也是[N_voxel, 4]，和你的是一致的。但我的args['spconv']['num_features_in'] 确实也是64。

我把args['spconv']['num_features_in'] 改成了16，8, 4，模型也能正常运行，但不能是任意数字。

可能在spconv 1.2.1在处理tensor的时候没有进行检查，内存里都是连续的存放，导致我这里写成64也能跑而且跑通。4应该是正确的，在这种解读下，我的64应该是把voxel number缩水了？暂时还没看spconv代码具体如何处理，略过这样带来的shape错误的。你设置成4一定是正确的。

Jul 10 '24 16:07 yifanlu0227

Hi, 我把所有SECOND相关的yaml的args['spconv']['num_features_in']都修正为4。现在spconv 1.2.1和spconv 2.x都可以运行。

Thanks for catching this!

Jul 11 '24 04:07 yifanlu0227

HEAL HEAL copied to clipboard