TransUNet
TransUNet copied to clipboard
is there some thing wrong?
I am looking at your code and found some problems
https://github.com/Beckschen/TransUNet/blob/main/networks/vit_seg_modeling.py#L133
patch_size = (img_size[0] // 16 // grid_size[0], img_size[1] // 16 // grid_size[1])
The training image size is 224
then patch_size = (224 //16// 16, 224 //16 //16) = (0,0)???
why???
Hello,
Many thanks for your questions. The patch size you calculated is (1, 1) in feature grids, representing (16, 16) in image level since the image is downsampled 16x through resnet. Let me know if you have any questions.
I have the same problem when training the data, the patch_size equals (0,0) after the code 'patch_size = (img_size[0] // 16 // grid_size[0], img_size[1] // 16 // grid_size[1])', I don't understand the reply from Beckschen, could you help to explain a little bit more?
I think the grid_size hyperparameter in config should be (14, 14) instead of (16, 16), as the paper clearly indicates that the img_size should be (224, 224) for sure.
Can you confirm on this @Beckschen ?
I think the grid_size hyperparameter in config should be (14, 14) instead of (16, 16), as the paper clearly indicates that the img_size should be (224, 224) for sure.
Can you confirm on this @Beckschen ?
I think u are right.