Baifeng Shi

Results 34 comments of Baifeng Shi

Hi! I've tried training FAN-S and I can reproduce the results in the paper. However, when I train FAN-L, I found that the validation accuracy reaches a peak of ~83.5...

Thanks for the suggestion! I will try that.

Hi, have you tried go to `/lib/core/config.py` and change `config.DATASET_NAME` into `ActivityNet12`?

Hi, that's a good question. This part is for selecting the relevant features on the channel dimension while the previous selection is on the spatial dimension. We find selecting on...

Hi, sorry for the late response. Are you testing the attention on single-object image or multi-object image? The phone image is single-object while the screenshot you show here seems from...

Yes, sorry about the delay. I should be able to release the code this week or next.

Hi, could you provide the detailed error info? Does it occur when loading specific parameters?

Hi, the results of VILA-3B-S2 is trained with ViT unfrozen. We didn't observe any negative effect of that.

Good point. In the [paper](https://arxiv.org/pdf/2403.13043) we compare s2 versus directly extracting features from larger image without splitting (Table 12), and it turns out it's much more inefficient and has worse...