tangbo-sh
Results
1
comments of
tangbo-sh
The technical report shows that only 8B data is used to train the long context, while the readme shows that 200B is trained on the 16K window. So I am...