Baifeng Shi comments

Results 34 comments of


                                            Baifeng Shi

Performance of FAN_tiny on ImageNet1K

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5...

Performance of FAN_tiny on ImageNet1K

Thanks for the suggestion! I will try that.

About traing_all on ActivityNet

Hi, have you tried go to `/lib/core/config.py` and change `config.DATASET_NAME` into `ActivityNet12`?

About the ProtoNet with Conv4 backbone result

me too

questions about top_down_transform

Hi, that's a good question. This part is for selecting the relevant features on the channel dimension while the previous selection is on the spatial dimension. We find selecting on...

Unsatisfactory demo result

Hi, sorry for the late response. Are you testing the attention on single-object image or multi-object image? The phone image is single-object while the screenshot you show here seems from...

Will the code about visual-language task be released?

Yes, sorry about the delay. I should be able to release the code this week or next.

model.load_state_dict(checkpoint['state_dict'])

Hi, could you provide the detailed error info? Does it occur when loading specific parameters?

Does S2 able to unfreeze vit to train?

Hi, the results of VILA-3B-S2 is trained with ViT unfrozen. We didn't observe any negative effect of that.

Hi, Have you compare with s2 [384, 768] scales versus interpolate to 768x768?

Good point. In the [paper](https://arxiv.org/pdf/2403.13043) we compare s2 versus directly extracting features from larger image without splitting (Table 12), and it turns out it's much more inefficient and has worse...