LLMLingua
LLMLingua copied to clipboard
The specific parameter settings in the compressor for reproduce NQ
Very nice work! I am trying to replicate the results of longllmlingua on a Natural Questions dataset, but there may be some discrepancies between the results and those in the paper due to unclear values that should be set for each parameter of the compressor. So I would like to inquire the specific parameter settings in the compressor?
Hi @ignorejjj, thank you for your support for our work.
No problem, I will post the parameters we used in the experiment below for reference. We will also release the relevant code after the review,
The 2x compression ratio uses,
res = []
with xopen(path) as f:
for ii, jj in tqdm(enumerate(f), total=2655):
if ii < len(res):
continue
input_example = json.loads(jj)
question = input_example["question"]
documents = []
for ctx in deepcopy(input_example["ctxs"]):
documents.append(Document.from_dict(ctx))
prompt = get_qa_prompt(
question,
documents,
mention_random_ordering=False,
query_aware_contextualization=False,
)
c = prompt.split("\n\n")
instruction, question = c[0], c[-1]
demonstration = "\n".join(c[1:-1])
compressed_prompt = llm_lingua.compress_prompt(demonstration.split("\n"), instruction, question, 0.55, use_sentence_level_filter=False, condition_in_question="after_condition", reorder_context="sort", dynamic_context_compression_ratio=0.3, condition_compare=True, context_budget="+100", token_budget_ratio=1.05, rank_method="longllmlingua")
res.append({"id": ii, "prompt": compressed_prompt, "answer": input_example["answers"]})
json.dump(res, open(f"prompt/loss_in_middle/ours_{doc_num}_{idx}_2x_dem_after_add_prompt1_dy03dem_sort.json", "w"))
The 4x compression ratio uses,
compressed_prompt = llm_lingua.compress_prompt(demonstration.split("\n"), instruction, question, 0.75, use_sentence_level_filter=False, condition_in_question="after_condition", reorder_context="sort", dynamic_context_compression_ratio=0.4, condition_compare=True, context_budget="*1.2", token_budget_ratio=1.05, rank_method="longllmlingua")
If you have more questions, feel free to reply and discuss.
thank for your quick reply!
Very appreciate your awesome work. Could you please provide the code for evaluation, including batched inference?
Very appreciate your awesome work. Could you please provide the code for evaluation, including batched inference?
Hi @zhyunlong, thank you for your support with LLMLingua.
We utilize the same script as 'lost in the middle'. You can access the script at this link.