autogen icon indicating copy to clipboard operation
autogen copied to clipboard

Text Compression Transform

Open WaelKarkoub opened this issue 1 year ago • 18 comments

Why are these changes needed?

This PR introduces text compression by leveraging the LLMLingua library. This addition enhances processing efficiency and response speed by reducing token usage in large language models.

NOTE: LLM lingua uses locally hosted models, so caching might be important here.

Future work:

  • Image Compression
  • Video Compression

Related issue number

Closes #2538

Checks

  • [ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
  • [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
  • [x] I've made sure all auto checks have passed.

WaelKarkoub avatar Mar 31 '24 19:03 WaelKarkoub

Codecov Report

Attention: Patch coverage is 25.96154% with 77 lines in your changes are missing coverage. Please review.

Project coverage is 45.11%. Comparing base (ded2d61) to head (ec6fe57). Report is 35 commits behind head on main.

Files Patch % Lines
...togen/agentchat/contrib/capabilities/transforms.py 20.23% 67 Missing :warning:
...agentchat/contrib/capabilities/text_compressors.py 50.00% 10 Missing :warning:
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #2225       +/-   ##
===========================================
+ Coverage   33.33%   45.11%   +11.77%     
===========================================
  Files          83       86        +3     
  Lines        8636     9108      +472     
  Branches     1835     2090      +255     
===========================================
+ Hits         2879     4109     +1230     
+ Misses       5516     4651      -865     
- Partials      241      348      +107     
Flag Coverage Δ
unittest 12.61% <25.96%> (?)
unittests 44.36% <0.00%> (+11.03%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Mar 31 '24 19:03 codecov-commenter

@sonichi I'll open a PR to handle deprecation and add a topic. The plan is to merge the deprecation PR before this one to simplify the workflow

WaelKarkoub avatar Apr 01 '24 01:04 WaelKarkoub

There should be an option of not running the LLMLingua on the same machine. When deployed on servers, we should be able to run GPU-bound tasks on specialized harder and not on the same machine on which we run agents that are IO-bound tasks.

davorrunje avatar Apr 02 '24 20:04 davorrunje

Hi @davorrunje, do you have any recommendations for offloading Python workloads to remote machines? I don't have much experience with it, although I do have some experience with Ray. I'm not sure if I can implement it, though. Another idea I had was to serve LLM Lingua on the remote machine and simply request compressed messages. What do you think?

WaelKarkoub avatar Apr 02 '24 21:04 WaelKarkoub

Hi @davorrunje, do you have any recommendations for offloading Python workloads to remote machines? I don't have much experience with it, although I do have some experience with Ray. I'm not sure if I can implement it, though. Another idea I had was to serve LLM Lingua on the remote machine and simply request compressed messages. What do you think?

I would suggest abstracting the mechanism using a protocol and then implementing the protocol with local deployment. That way others can easily replace local with distributed deployment, you don't need to do it yourself.

davorrunje avatar Apr 03 '24 05:04 davorrunje

⚠️ GitGuardian has uncovered 5 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request
GitGuardian id GitGuardian status Secret Commit Filename
10404693 Triggered Generic High Entropy Secret 73b80927eef01525ccf46302f6b71209c7ed8220 test/oai/test_utils.py View secret
10404662 Triggered Generic CLI Secret d1e55c404c1b14fcd178433b107332fddfb84898 .github/workflows/dotnet-release.yml View secret
10404694 Triggered Generic High Entropy Secret 73b80927eef01525ccf46302f6b71209c7ed8220 test/oai/test_utils.py View secret
10404696 Triggered Generic High Entropy Secret 73b80927eef01525ccf46302f6b71209c7ed8220 test/oai/test_utils.py View secret
10422482 Triggered Generic High Entropy Secret 73b80927eef01525ccf46302f6b71209c7ed8220 test/oai/test_utils.py View secret
🛠 Guidelines to remediate hardcoded secrets
  1. Understand the implications of revoking this secret by investigating where it is used in your code.
  2. Replace and store your secrets safely. Learn here the best practices.
  3. Revoke and rotate these secrets.
  4. If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider


🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.

gitguardian[bot] avatar Apr 22 '24 16:04 gitguardian[bot]

@gagb @sonichi I found some time to finalize text compressions. I finalized the protocol and added a new user guide. When this PR gets merged, we can plan to target an autogen version to remove old code.

WaelKarkoub avatar May 01 '24 17:05 WaelKarkoub

@marklysze Just added cache support, let me know what you think of the implementation

WaelKarkoub avatar May 04 '24 00:05 WaelKarkoub

@marklysze Just added cache support, let me know what you think of the implementation

Wow, that was fast! I'll check it out :) much appreciated.

marklysze avatar May 04 '24 01:05 marklysze

Just a question, not related specifically to text compression but to TransformMessages, is it possible to allow the passing in of a single transform as well as the dictionary of them? So, if they pass in just one it can be added to a dictionary and the code works as is.

E.g. context_handling = transform_messages.TransformMessages(transforms=text_compressor) where the _transforms checks if it's a dictionary and, if not, just puts the object into a dictionary

Just caters for users forgetting that it needs to be a dictionary (like me).

marklysze avatar May 04 '24 05:05 marklysze

When using LLMLingua, is there any way to suppress the warning: Token indices sequence length is longer than the specified maximum sequence length for this model (521 > 512). Running this sequence through the model will result in indexing errors

I tried verbose=False on the TransformMessages but it still came through. I'm not we can control that?

marklysze avatar May 04 '24 05:05 marklysze

What would be the best way to avoid text compression on certain messages?

E.g. for a debating scenario group chat, I added the TextCompression to the select speaker (auto) functionality and I noticed that the debate question: Please debate the proposition 'Cats make great pets. was being compressed as: debate proposition Cats pets.

This may just pass in the Select Speaker case but I can see that the user may want to avoid compressing key messages. My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long. Perhaps also the ability to ignore compressing where name='abc'/'def', etc.

There may be a better way to handle this but would like your thoughts on whether it should be built in or handled through a custom TransformMessage.

marklysze avatar May 04 '24 06:05 marklysze

@marklysze The compressed text might not make sense to humans but it makes sense to an llm, at least that's what the research behind LLMLingua suggests. You can always improve the performance by using a larger model as well, like https://huggingface.co/microsoft/llmlingua-2-xlm-roberta-large-meetingbank. You can also implement a custom text compressor like the agent summarize you suggested and in the prompt ask it not to modify any key information.

My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long.

With the TransformMessages capability, the system prompt is ignored from any transformation by default: https://github.com/microsoft/autogen/blob/3a4bb088f70776b7a3ca847f487926b8d2e6619b/autogen/agentchat/contrib/capabilities/transform_messages.py#L68-L70

Perhaps also the ability to ignore compressing where name='abc'/'def', etc.

I agree with your suggestion. How about we add this to our backlog for now and revisit it after gathering more user feedback? I don't have a good sense of what users want from message text compressors as this feature hasn't been released yet

WaelKarkoub avatar May 04 '24 13:05 WaelKarkoub

@marklysze Also just fyi, you can add custom instructions to llmlingua:

https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427

Which you could specify as an option in compression_args in the constructor of TextMessageCompressor

WaelKarkoub avatar May 04 '24 13:05 WaelKarkoub

@marklysze The compressed text might not make sense to humans but it makes sense to an llm, at least that's what the research behind LLMLingua suggests. You can always improve the performance by using a larger model as well, like https://huggingface.co/microsoft/llmlingua-2-xlm-roberta-large-meetingbank. You can also implement a custom text compressor like the agent summarize you suggested and in the prompt ask it not to modify any key information.

My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long.

With the TransformMessages capability, the system prompt is ignored from any transformation by default:

https://github.com/microsoft/autogen/blob/3a4bb088f70776b7a3ca847f487926b8d2e6619b/autogen/agentchat/contrib/capabilities/transform_messages.py#L68-L70

Perhaps also the ability to ignore compressing where name='abc'/'def', etc.

I agree with your suggestion. How about we add this to our backlog for now and revisit it after gathering more user feedback? I don't have a good sense of what users want from message text compressors as this feature hasn't been released yet

Thanks @WaelKarkoub, I'll continue testing but I can see that it's changing agent names as well and that's going to be problematic. Perhaps a different model or different parameters can fix that for select speaker.

For example: Read the above conversation. Then select ONLY THE NAME of the next speaker from ['Debate_Moderator_Agent', 'Affirmative_Constructive_Debater', 'Negative_Constructive_Debater', 'Affirmative_Rebuttal_Debater', 'Negative_Rebuttal_Debater', 'Debate_Judge'] to speak. Do not explain why.

Is compressed to: Read above conversation select NAME next speaker Debate _ Moderator _ Agent Affirmative Constructive Debater Negative Debater Affirmative Rebuttal Debater Negative Debate _ Judge speak explain why

With agent names needing to be relatively precise for selection, we'll need to avoid compressing agent names.


Thanks for highlighting the if messages[0]["role"] == "system":, unfortunately for my test debating example, the prompt is not the first message, but the second, and hence is being compressed.

"[{'content': 'debate proposition Dogs better pets', 'role': 'user', 'name': 'Debate_Moderator_Agent'}, {'content': 'Read above conversation select NAME next speaker Debate _ Moderator _ Agent Affirmative Constructive Debater Negative Debater Affirmative Rebuttal Debater Negative Debate _ Judge speak explain why', 'role': 'system'}]"

I'll test with larger models as suggested and check if there are any parameters that can help.

marklysze avatar May 04 '24 20:05 marklysze

@marklysze Also just fyi, you can add custom instructions to llmlingua:

https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427

Which you could specify as an option in compression_args in the constructor of TextMessageCompressor

Great! I see target_token and that sounds like something very useful. I'll give that a go.

marklysze avatar May 04 '24 20:05 marklysze

@marklysze Also just fyi, you can add custom instructions to llmlingua: https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427 Which you could specify as an option in compression_args in the constructor of TextMessageCompressor

Great! I see target_token and that sounds like something very useful. I'll give that a go.

Just an update that using target_token was effective in targeting a specific token count for compression: text_compressor = TextMessageCompressor(text_compressor=LLMLingua(prompt_compressor_kwargs=compression_config),compression_args={"target_token": 10000})

Thanks @WaelKarkoub!

marklysze avatar May 05 '24 03:05 marklysze

Just noticed that if a message content is empty, then I don't think it should check the cache or compress. So could we also check here if the content is '':

        for message in processed_messages:
            # Some messages may not have content.
            if not isinstance(message.get("content"), (str, list)):
                continue

Just a note that trying to get the cached value using an empty string was returning a non-empty string.

marklysze avatar May 05 '24 03:05 marklysze