LLMLingua
LLMLingua copied to clipboard
How to reproduce Multidocument QA results under 9th?
My reproduction of the results on location 9 of the NQ dataset in the longllmlingua paper using the prompt compressor resulted in a large discrepancy from the original results. My hyperparameters are set as follows:
The args.t was set to True and False in two experiments, which was to verify the validity of the contrast ITC.When args.t is set to True, accuracy is 63, while when args.t is set to False, the accuracy is 69.
Questions: 1.What are the hyperparameters that can accurately reproduce the results in the paper with an accuracy of approximately 70.8%(NQ 2x 9th)? 2.Why does contrast ITC drop so severely under my current settings?
The args.ratio is set to 0.5
Just checked the script with @Twillghtaaa and found the main issue lies in the call mode of LLMs, with parameters largely consistent with those mentioned earlier.
Experiments in LLMLingua and most experiments in LongLLMLingua were conducted in completion mode, whereas chat mode tends to be more sensitive to token-level compression. However, OpenAI has currently disabled GPT-3.5-turbo's completion; you can use GPT-3.5-turbo-instruction
or Azure OpenAI service instead.
Hi, @Twilightaaa ! Could you please share your reproduction script of experiments on location 9 of the NQ dataset? Thanks!
Hi @yfpeng1234, you can follow the https://github.com/microsoft/LLMLingua/blob/main/examples/RAG.ipynb and use "GPT-3.5-turbo-instruction" model.
Hi, @yfpeng1234! I followed the instructions provided in https://github.com/microsoft/LLMLingua/blob/main/examples/RAG.ipynb and utilized the "GPT-3.5-turbo-instruction" model without any additional modifications or adjustments.