Osprey icon indicating copy to clipboard operation
Osprey copied to clipboard

[CVPR2024] The code for "Osprey: Pixel Understanding with Visual Instruction Tuning"

Results 13 Osprey issues
Sort by recently updated
recently updated
newest added

您好 请问你们在训练的时候,有没有遇到过训练卡在第一个epoch,但是GPU占用为100%的情况 一开始以为是服务器的问题,但只要把MASK Token部分代码删掉,就可以正常训练 ``` if cur_input_ids.numel() > 0: if getattr(self.config, 'tune_mm_mlp_adapter', False) and getattr(self.config, 'mm_use_im_start_end', False): mask_idx = torch.nonzero(cur_input_ids==self.tokenizer.convert_tokens_to_ids([''])[0]) _l = 0 for i, idx in enumerate(mask_idx): cur_new_input_embeds.append(self.get_model().embed_tokens(cur_input_ids[_l:idx[0]]).detach()) ##...

![image](https://github.com/CircleRadon/Osprey/assets/30163938/77bc687b-5642-40ec-8f82-9c246feddc75)

Great work! Could you release the data annotation pipeline?

Thank you for the amazing work! I saw the Offline demo you introduced, but it seems that this process still requires manual clicking on specific masks to generate a specific...

您好,感谢您的工作和开源代码👍👍👍!我想请教一下: - 在训练Osprey的整个过程中是否有使用到一些视频领域的多模态数据集呢?比如MSR-VTT, MSVD和VATEX. - 我看您使用到了COCO, RefCOCO等数据集,他们是不是包含了MS-COCO呢?好像MS-COCO是COCO的子集😀 感激不尽!💐💐💐

Hi! Thanks for the great work! Could you share any configs on fine-tuning Osprey on RefCOCOg dataset? I am trying to follow your work and reproduce the results on it,...

在ospery中,convnext产生的image feature token应该是1024个 (1024 * 768的特征),再结合mask feature (128 + 64 + 32 + 16)和pos的 token,以及text的token 是否会比较容易超出2048比较多? 如果以上数值理解有谬误,烦请指正,非常感谢~

Hello, when i try to evaluate Open-Vocabulary Segmentation for cityscapes, there are some following errors [02/07 15:28:29 detectron2]: Start inference on 500 batches 0it [00:14, ?it/s] Traceback (most recent call...

![image](https://github.com/CircleRadon/Osprey/assets/108161275/1cc5471e-30c7-4093-b1f2-5f18b6bf0541) Hello,vg images link is invalid. Could you tell me how to download it?