Yuxuan Xia

Results 3 comments of Yuxuan Xia

This is the current env-check.sh result ![env-check2](https://github.com/intel-analytics/ipex-llm/assets/77518229/34510b69-2a58-4a44-aa7c-6b86f4d7b927) ![env-check1](https://github.com/intel-analytics/ipex-llm/assets/77518229/c853d078-d1f2-45e4-a6f1-b1c7060444c1)

We cannot reproduce this issue. 2k input memory is always larger than 1k's. If we use Quantized KV Cache, the long sequence's second token latency might outperform the shorter sequence....

I think the pretrained full model is provided in the repo but it is not that obvious. You can check this [link](https://drive.google.com/drive/folders/15wx9vOM0euyizq-M1uINgN0_wjVRf9J3)