Segment-Everything-Everywhere-All-At-Once Question about the dataset.

Nice work! But I met an error when trying to reproduce your training progress. Could you tell me how to prepare these three? "coco/annotations/panoptic_train2017_filtrefgumdval.json", "coco/annotations/captions_train2017_filtrefgumdval.json", "coco/annotations/grounding_train2017_filtrefgumd.json". I think I only got panoptic_train2017_filtrefgumdval_filtvlp/captions_train2017_filtrefgumdval_filtvlp/grounding_train2017_filtrefgumdval_filtvlp following your dataset prepare guide.

Dec 30 '23 02:12 Rexzhan

Thanks so much for your reminder, I just updated the new files at:

https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json

Dec 30 '23 03:12 MaureenZOU

@MaureenZOU Thanks for the update! It seems these new files are not reflected on the README yet. To reproduce SEEM, I assume these new files are correct?

Dec 30 '23 23:12 ziqipang

@ziqipang , @MaureenZOU May i ask when the model consumes coco_caption_karpathy_test.arrow,filtcoco2017val_caption_karpathy_train.arrow... in pretrain_arrows_code224 during training stage? I followed the script TRAIN.md, but the provided config files seem to not be using vlp_dataset/coco_caption_karpathy? Please correct me if I am wrong，，

Dec 31 '23 03:12 Rexzhan

@Rexzhan I think XDecoder uses those vision-language data, but SEEM only uses the segmentation data. I am just starting to work in this field. Please double check if my understanding is correct.

Jan 02 '24 17:01 ziqipang

Thanks so much for your reminder, I just updated the new files at:

https://huggingface.co/xdecoder/SEEM/blob/main/panoptic_train2017_filtrefgumdval.json https://huggingface.co/xdecoder/SEEM/blob/main/grounding_train2017_filtrefgumd.json https://huggingface.co/xdecoder/SEEM/blob/main/coco_train2017_filtrefgumdval_lvis.json https://huggingface.co/xdecoder/SEEM/blob/main/captions_train2017_filtrefgumdval.json

sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.

Jan 04 '24 12:01 xpzwzwz

@MaureenZOU sorry,but i don't see the file "panoptic_val2017.json " you mentioned in "DATASET.md",could you upload it?thx.

Jan 04 '24 12:01 xpzwzwz

This could be download from the official website

Jan 04 '24 14:01 MaureenZOU

According to the DATESET.md, I have downloaded related to coco2017 dataset from official website(https://cocodataset.org/#download) I also create some *.arrow files but I can't create all of files for training.

How to create "4M Image Text Pairs" with ViLT?

4M Image Text Pairs (X-Decoder)
We follow the exact data preparation for the image text pairs data with [ViLT](https://github.com/dandelin/ViLT/blob/master/DATA.md).

# The pretrained arrow file are put under .xdecoder_data/pretrain_arrows_code224 with the following list of files.
["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] + ["code224_vg.arrow"] + [f"code224_sbu_{i}.arrow" for i in range(9)] + [f"code224_conceptual_caption_train_{i}.arrow" for i in range(31)]
# ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] are originated from ["filtcoco2017val_caption_karpathy_train.arrow", "filtcoco2017val_caption_karpathy_val.arrow", "filtcoco2017val_caption_karpathy_restval.arrow"] with deletion of coco val2017 overlapped images to avoid information leakage.
To get quick started:

# Download coco karparthy test set (we hack the training data to be coco_caption_karpathy_test.arrow only for quick start in the codebase)
wget https://huggingface.co/xdecoder/X-Decoder/resolve/main/coco_caption_karpathy_test.arrow
After dataset preparation, the dataset structure would be:

.xdecoder_data
└── pretrain_arrows_code224/
    ├── coco_caption_karpathy_test.arrow
    ├── *filtcoco2017val_caption_karpathy_train.arrow
    ├── ...
    ├── *code224_vg.arrow
    ├── *code224_sbu_0.arrow
    ├── ...
    ├── *code224_conceptual_caption_train_0.arrow
    └── ...
* Those datasets are optional for debugging the pipeline. ! NEED to add back when you are training the model.

Jan 19 '24 03:01 seungyoungshin

Segment-Everything-Everywhere-All-At-Once Segment-Everything-Everywhere-All-At-Once copied to clipboard

Question about the dataset.

Segment-Everything-Everywhere-All-At-Once
Segment-Everything-Everywhere-All-At-Once copied to clipboard