VLCounter Hi, I'm running into a weight structure mismatch when I run the test.py

if args.EVALUATION.ckpt_used is not None:
    filepath = os.path.join(root_model, f'{args.EVALUATION.ckpt_used}.pth')
    assert os.path.isfile(filepath), filepath
    print("=> loading model weight '{}'".format(filepath),flush=True)
    checkpoint = torch.load(filepath)
    model.load_state_dict(checkpoint['state_dict'])
    print("=> loaded model weight '{}'".format(filepath),flush=True)

error

Mar 16 '24 08:03 Minsky520

https://github.com/Seunggu0305/VLCounter/blob/2dc15ddd218744c2c3c63b667fa0bc4a24ce8c3c/tools/models/VLCounter.py#L36

Can you comment out above line and try running the test again?

Mar 16 '24 12:03 Seunggu0305

Sorry, I set the flag variable to False like you said, but it still doesn't work, hope to get your help

Mar 18 '24 12:03 Minsky520

You should leave the flag variable to True and comment out the mentioned line. Try to replace VLCounter.py L36~L39 as below.

        # if flag:
        self.gn = nn.GroupNorm(8, out_channels)
        self.gelu = nn.GELU()
        self.up = nn.UpsamplingBilinear2d(scale_factor=2)

Mar 19 '24 00:03 Seunggu0305

Hello Dear Author, I noticed that the weight for contrast loss is set to 1e-6, which means that contrast learning doesn't seem to play a major role. May I ask why you set the weights so small?

Mar 19 '24 06:03 Minsky520

Thank you very much for your previous patience and I look forward to your response.

Mar 19 '24 06:03 Minsky520

You should leave the flag variable to True and comment out the mentioned line. Try to replace VLCounter.py L36~L39 as below.
        # if flag:
        self.gn = nn.GroupNorm(8, out_channels)
        self.gelu = nn.GELU()
        self.up = nn.UpsamplingBilinear2d(scale_factor=2)

Was the problem solved by the method mentioned above?

The reason for setting the lambda value small is simply to adjust the scale of the value. Since the value of L2 loss is very small, the lambda value should be small and can be considered to play a significant role.

Mar 19 '24 10:03 Seunggu0305

Hello author, first of all, thank you for your patient reply, my previous questions have been answered, I would like to express my sincere thanks to you! I still have questions for you. In the process of reproducing your code, I found that the pre-training weights loaded by the visual and text encoders in the VLCounter.py file are both ViT-B-16.pt, may I ask what is the reason for that? And after checking huggingface website, I found that the clip weights are in the form of "pytorch_model.bin", may I ask how did you get the "ViT-B-16.pt"? Looking forward to your reply!

Mar 20 '24 14:03 Minsky520

You can download *.pt weight files of CLIP from original repo.

https://github.com/openai/CLIP/blob/a1d071733d7111c9c014f024669f959182114e33/clip/clip.py#L30-L40

Mar 21 '24 00:03 Seunggu0305

You should leave the flag variable to True and comment out the mentioned line. Try to replace VLCounter.py L36~L39 as below.
        # if flag:
        self.gn = nn.GroupNorm(8, out_channels)
        self.gelu = nn.GELU()
        self.up = nn.UpsamplingBilinear2d(scale_factor=2)

This does not help because the dimension of the last decoder layer is one and cannot be divided by the group number 8. The exception occurred in: https://github.com/Seunggu0305/VLCounter/blob/df198668d977c0afe9ca09c8c767f2f125aabf5c/tools/models/VLCounter.py#L85 Exception has occurred: ValueError num_channels must be divisible by num_groups

Maybe the group number is 1 at the last layer?

if flag:
    self.gn = nn.GroupNorm(8, out_channels)
else:
    self.gn = nn.GroupNorm(1, out_channels)
self.gelu = nn.GELU()
self.up = nn.UpsamplingBilinear2d(scale_factor=2)

Apr 18 '24 11:04 nanfangAlan

The reported results can be reproduced by making modifications like this. 'MAE': 16.951104744592634, 'RMSE': 106.03263784390961

Apr 18 '24 11:04 nanfangAlan

The reported results can be reproduced by making modifications like this. 'MAE': 16.951104744592634, 'RMSE': 106.03263784390961

It's great to hear from you and I'll try to follow up on the change part as you said

Apr 18 '24 15:04 Minsky520