MGM icon indicating copy to clipboard operation
MGM copied to clipboard

How many SAM images were used from ShareGPT4v?

Open OpenJarvisAI opened this issue 10 months ago • 2 comments

I downloaded sharegpt4v used fientuneing part data, but always got image not found, am using finetune stage.

Does finetune used sharegpt4v pretrain data?

sharegpt4v finetune just used very little data from SAM.

Shall we download the. sam_000000 - 0000050 whole 500GB images for it?

OpenJarvisAI avatar Apr 12 '24 13:04 OpenJarvisAI

We adopt the 100K ShareGPT (caption) data in the SFT. I will calculate the number of image files used in SAM and I can extract them to form a shared link to you (if the total number is not that large). Please stay tuned.

JulianJuaner avatar Apr 14 '24 02:04 JulianJuaner

Thanks, but I downloaded sam_000000 sam_000001 sam_0000002 and seems there no file found error got during whole training process.

BTW, it's even good if you guys can share a laion_gpt4_dataset_imags.zip since many urls are broken since your download.

OpenJarvisAI avatar Apr 14 '24 03:04 OpenJarvisAI