InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

Do you plan on releasing the dataset used to train internVL 1.5 ?

Open Mohamed-Dhouib opened this issue 1 year ago • 1 comments

Hello, As stated in the huggingface page of InternVL 1.5, a High-Quality Bilingual Dataset was used to train this model. Do you plan to release this dataset in the future ? Thanks !

Mohamed-Dhouib avatar Apr 24 '24 18:04 Mohamed-Dhouib

Hi, we may release the annotation files in the JSONL format that we use. However, to make them usable for everyone, we will need to create a document detailing the placement of paths and the downloading of images. This will take some time.

czczup avatar Apr 26 '24 17:04 czczup