LLMLingua icon indicating copy to clipboard operation
LLMLingua copied to clipboard

Meaningless tokens generation

Open aravindpai opened this issue 1 year ago • 3 comments

Describe the bug

Long LLM Lingua is generating the meaningless tokens and combining it.

For example, Fuel Dilution is combined to "uedil"

Why is it happening?

Steps to reproduce

No response

Expected Behavior

No response

Logs

No response

Additional Information

No response

aravindpai avatar May 09 '24 07:05 aravindpai

Hi @aravindpai, thanks for your support. Yes, since LLMLingua performs token-level compression, it combines tokens into a new word. If you need to handle specific sensitive words, such as personal or geographical names, you can use the recovery function to restore the relevant content.

iofu728 avatar May 10 '24 08:05 iofu728

Hi @iofu728 What is a recovery function? and how do i restore the relevant content?

aravindpai avatar May 10 '24 08:05 aravindpai

Hi @aravindpai, you can refer this document https://github.com/microsoft/LLMLingua/blob/main/DOCUMENT.md#post-processing to use recover function.

iofu728 avatar May 10 '24 08:05 iofu728