[Question]: reproducing LongLLMLingua on the LongBench dataset.

Open junepark1 opened this issue 1 year ago • 1 comments

Describe the issue

Thank you for your work.

I tried to reproduce LongLLMLingua on the LongBench dataset.

https://github.com/microsoft/LLMLingua/blob/main/examples/Code.ipynb. It seems to be a code for reproducing one of the longbench datasets, the repobench-p.

I have two questions.

In the paper, you said you did not use the reordering strategy. But I think this ipynb code has reordering strategy. When proceeding with repoduce, may I know if I should proceed with the reordering strategy for each dataset?
Can I apply the same parameters like this ipynb code for all LongBench dataset? If each dataset has a different parameter, can I know parameters for each dataset?

Thank you!

Apr 03 '24 07:04 junepark1

Hi @junepark1, apologies for the late response,

Yes, you need to disable the reranker for now. We will update the results with the reranker enabled in the future.
You can use the same parameters for all tasks. For other LongBench-related logic, you can refer to https://github.com/microsoft/LLMLingua/blob/main/experiments/llmlingua2/evaluation/eval_longbench.py

Apr 07 '24 07:04 iofu728