Renrui Zhang comments

Results 56 comments of


                                            Renrui Zhang

The "alpha" and "beta" in the paper are the opposite of the "alpha" and "beta" in the code of Tip-Adapter

@euminds Thanks for pointing out. We have fix this and release a new code base in a repo. Concerning 65.45% for Tip-Adapter, the released code would achieve 65.51% on my...

Bug when I try cifar100

@heng-yin Thanks for your great patience! We have release the code for other datasets in the repo. Maybe you can follow their configs to implement on CIFAR100.

Run TIP-adapter on text2img retrieval instead

Thanks for your interest! I suppose if the query and values are within the same embedding space, e.g., both text features, they can directly calculate the affinity matching and produce...

Run TIP-adapter on text2img retrieval instead

That is a quite insightful question. I tried on some datasets with varying K for different categories. Generally, a larger K leads to higher classification accuracy for the corresponding category....

What is the meaning of refpoint_embed?

Thanks for your interest. 6 dimensions denote (x, y, l, t, t, b), representing the xy center and the distances of four box boundaries lrtb to the center.

What is the meaning of refpoint_embed?

The (l, r, t, b) is especially for monocular 3D object detection adopted by [MonoFlex](https://openaccess.thecvf.com/content/CVPR2021/papers/Zhang_Objects_Are_Different_Flexible_Monocular_3D_Object_Detection_CVPR_2021_paper.pdf), since the projected 3D center may not locate at the center of the 2D box....