ydhong_hit
ydhong_hit
I estimate that the variance of some sota methods such as segmenter and BEiT is more than 1% mIoU on ade20k.
> Hi, can you provide the code of grounding transformer consistent with the paper, including the Feud and mixsoft not the single implement of Feud or mixsoft but the intact...
> Thanks for your interest. Initially, we did not release the configs and pre-trained weights for cityscapes because video data of cityscapes is extremely large (>300 GB). Nevertheless, we will...
> Hi ydhong. Have you tried training lawin-B2 for 160K iterations? The performance reported in table 1 is obtained by a 160k training course. Thanks for your reply. According to...
> Yes, it should be 512. Also, I recommend switching the proj_type in PatchEmbed from 'pool' to 'conv', which will take the group conv inlace of the mix pooling at...
During the inference stage, I find that your model requires the input resolution to be multiple of 64. For the ADE20K, I use the 'ResizeToMultiple' in mmseg to achieve this....
> Sorry for the late reply. Honestly, we have to delay the full-code release plan because lawin has not been accepted by any confs or journals up to now, and...
Resize_pos_embed function is not being used in [here](https://github.com/facebookresearch/dinov2/blob/main/dinov2/eval/segmentation_m2f/models/backbones/vit.py). What to do when pre-training resolution is inconsistent with downstream input resolution?
> Could you provide the train config file of EVA-02-L on cocostuff164k which achieves 53.7 mIoU? I find the train config in the appendix. But you do not use the...