MiroThinker icon indicating copy to clipboard operation
MiroThinker copied to clipboard

Training datasets for MiroThinker

Open linghan1997 opened this issue 3 months ago • 1 comments

Hi, thank you for your excellent work and the high-quality open-source releases. Since you have also open-sourced several datasets for both SFT and RL, such as miromind-M1 and miroverse-v0.1, I would like to ask which datasets were used to train MiroThinker v1.0 or v1.5?

linghan1997 avatar Jan 07 '26 07:01 linghan1997

Hi, thank you for your interest! We open-sourced some SFT/RL datasets a few months ago. MiroThinker v1.5 also uses internal synthetic/curated questions and trajectories, which we’re not planning to release for now given the cost and QA overhead. We’ll keep the repository updated if anything changes.

jenny-miromind avatar Jan 09 '26 02:01 jenny-miromind