LLMLingua How to reproduce Multidocument QA results under 9th？

How to reproduce Multidocument QA results under 9th？

Open Twilightaaa opened this issue 1 year ago • 5 comments

My reproduction of the results on location 9 of the NQ dataset in the longllmlingua paper using the prompt compressor resulted in a large discrepancy from the original results. My hyperparameters are set as follows： The args.t was set to True and False in two experiments, which was to verify the validity of the contrast ITC.When args.t is set to True, accuracy is 63, while when args.t is set to False, the accuracy is 69.

Questions: 1.What are the hyperparameters that can accurately reproduce the results in the paper with an accuracy of approximately 70.8%(NQ 2x 9th)? 2.Why does contrast ITC drop so severely under my current settings?

Feb 16 '24 03:02 Twilightaaa

The args.ratio is set to 0.5

Feb 16 '24 03:02 Twilightaaa

Just checked the script with @Twillghtaaa and found the main issue lies in the call mode of LLMs, with parameters largely consistent with those mentioned earlier.

Experiments in LLMLingua and most experiments in LongLLMLingua were conducted in completion mode, whereas chat mode tends to be more sensitive to token-level compression. However, OpenAI has currently disabled GPT-3.5-turbo's completion; you can use GPT-3.5-turbo-instruction or Azure OpenAI service instead.

Feb 19 '24 06:02 iofu728

Hi, @Twilightaaa ! Could you please share your reproduction script of experiments on location 9 of the NQ dataset? Thanks!

Feb 29 '24 02:02 yfpeng1234

Hi @yfpeng1234, you can follow the https://github.com/microsoft/LLMLingua/blob/main/examples/RAG.ipynb and use "GPT-3.5-turbo-instruction" model.

Mar 01 '24 09:03 iofu728

Hi, @yfpeng1234! I followed the instructions provided in https://github.com/microsoft/LLMLingua/blob/main/examples/RAG.ipynb and utilized the "GPT-3.5-turbo-instruction" model without any additional modifications or adjustments.

Mar 01 '24 11:03 Twilightaaa

LLMLingua LLMLingua copied to clipboard

How to reproduce Multidocument QA results under 9th？

LLMLingua
LLMLingua copied to clipboard