tangbo-sh

Results 1 comments of tangbo-sh

The technical report shows that only 8B data is used to train the long context, while the readme shows that 200B is trained on the 16K window. So I am...