Yuxuan Xia
Yuxuan Xia
This is the current env-check.sh result  
We cannot reproduce this issue. 2k input memory is always larger than 1k's. If we use Quantized KV Cache, the long sequence's second token latency might outperform the shorter sequence....
I think the pretrained full model is provided in the repo but it is not that obvious. You can check this [link](https://drive.google.com/drive/folders/15wx9vOM0euyizq-M1uINgN0_wjVRf9J3)