Params to use for compressing Dialogues

Open vikram71198 opened this issue 1 year ago • 0 comments

Hi,

Thanks for this amazing piece of work. I was trying to use this framework to compress a prompt, which has a dialogue between two people as context & I was trying to compress the dialogue alone. I leave the instruction & question uncompressed.

So, far, even with low compression ratios like 0.1-0.15, I'm seeing significant deviation in outputs for the compressed prompt in comparison to the original, uncompressed prompt. In fact, the compressed prompt spitted out also tends to be unintelligible in quite a few places. I was using the same params as you do here, although I'm not entirely sure what context_budget does exactly.

Also, I currently pass the dialogue in as a str. Would it make any difference segmented it line by line and passing it in as a List[str]?

The dialogue has speaker roles like Agent: & Customer: that are dropped sometimes after compressing, is there a way I can make sure some tokens are never dropped? I'm guessing you do that using force_context_ids? Does this param take input_ids after tokenizing? I'm confused.

Do you have any suggestions on what would be a good/optimal param setting to compress dialogues?

Jan 18 '24 23:01 vikram71198