InternVL icon indicating copy to clipboard operation
InternVL copied to clipboard

Pre-Training & SFT datasets

Open Maple-geekZhu opened this issue 3 months ago • 1 comments

Thank you for your excellent work—InternVL3.5! Will the dataset you used during Pre-Training and SFT phase be made public? In the technical report, you mentioned that some additional data was included both in dataset. How were these composed or selected? I’d like to know these details to replicate the training pipeline.

Maple-geekZhu avatar Sep 11 '25 03:09 Maple-geekZhu

Thank you for sharing this excellent work. I am very interested in the dataset you mentioned. Would it be possible for the authors to provide access to this dataset, or share instructions on how to obtain it?

showstarpro avatar Sep 18 '25 08:09 showstarpro