KIVI
KIVI copied to clipboard
Unable to Reproduce Results for LongBench
Hello,
I ran the code provided for LongBench using the Llama-3-8B-Instruct model but couldn't reproduce the results reported in Table 8 of your paper. Specifically, the full precision baseline model's score for Qasper in my run is 32.11, while the reported score is 44.24.
I used the following command to run the model:
python pred_long_bench.py --model_name_or_path meta-llama/Meta-Llama-3-8B-Instruct --k_bits 16 --v_bits 16
Is there anything I might be missing?