Yuxuan Zhang
Yuxuan Zhang
Hi! Thanks for your reply! Useful solutions! But while following your "single common prompt" suggestion, there comes 2 cases: 1. `concept_list` should be None, otherwise `class_data_dir` won't work. And this...
use an older transformers version, for example 4.31.0
Hi, you may try to prompt the model with "[semantic] human face", which is our employed datasets' annotation. Besides, if you want super accurate segmentation masks, our model may not...
Furthermore, if you want to include human hair, try to prompt with "[semantic] head"
1. 224 image is in responsible for providing coarse indication of the grounded object, which may have few things to do with the detailed segmentation quality, I guess. 2. Our...
1. I ignored the multiple frames conditioning. I will take a look at that when I'm done with my current work. Thanks! 2. I suggest you simply prompt the model...
EVF-SAM doesn't support multi-class segmentation within one inference. You may consider batch inference.
We select from original o365 annotations by excluding categories with more than one instance for each image. Then we apply sam-2 to transfer bounding boxes to segmentation masks. Easy code...
here is our pipeline to produce o365 res data: 1.git clone sam2 2.run this py ``` from collections import Counter import json import torch import cv2 from tqdm import tqdm...
Hi, thank you for reproducing our work! Our BEIT experiment is to prove the effectiveness of "early-fusion", where "late" means use beit3 to extract separate single-modal feat and concat them....