autogen Text Compression Transform

Why are these changes needed?

This PR introduces text compression by leveraging the LLMLingua library. This addition enhances processing efficiency and response speed by reducing token usage in large language models.

NOTE: LLM lingua uses locally hosted models, so caching might be important here.

Future work:

Image Compression
Video Compression

Related issue number

Closes #2538

Checks

[ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
[x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
[x] I've made sure all auto checks have passed.

Mar 31 '24 19:03 WaelKarkoub

Codecov Report

Attention: Patch coverage is 25.96154% with 77 lines in your changes are missing coverage. Please review.

Project coverage is 45.11%. Comparing base (ded2d61) to head (ec6fe57). Report is 35 commits behind head on main.

Files	Patch %	Lines
...togen/agentchat/contrib/capabilities/transforms.py	20.23%	67 Missing :warning:
...agentchat/contrib/capabilities/text_compressors.py	50.00%	10 Missing :warning:

Additional details and impacted files

@@             Coverage Diff             @@
##             main    #2225       +/-   ##
===========================================
+ Coverage   33.33%   45.11%   +11.77%     
===========================================
  Files          83       86        +3     
  Lines        8636     9108      +472     
  Branches     1835     2090      +255     
===========================================
+ Hits         2879     4109     +1230     
+ Misses       5516     4651      -865     
- Partials      241      348      +107

Flag	Coverage Δ
unittest	`12.61% <25.96%> (?)`
unittests	`44.36% <0.00%> (+11.03%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

Mar 31 '24 19:03 codecov-commenter

@sonichi I'll open a PR to handle deprecation and add a topic. The plan is to merge the deprecation PR before this one to simplify the workflow

Apr 01 '24 01:04 WaelKarkoub

There should be an option of not running the LLMLingua on the same machine. When deployed on servers, we should be able to run GPU-bound tasks on specialized harder and not on the same machine on which we run agents that are IO-bound tasks.

Apr 02 '24 20:04 davorrunje

Hi @davorrunje, do you have any recommendations for offloading Python workloads to remote machines? I don't have much experience with it, although I do have some experience with Ray. I'm not sure if I can implement it, though. Another idea I had was to serve LLM Lingua on the remote machine and simply request compressed messages. What do you think?

Apr 02 '24 21:04 WaelKarkoub

Hi @davorrunje, do you have any recommendations for offloading Python workloads to remote machines? I don't have much experience with it, although I do have some experience with Ray. I'm not sure if I can implement it, though. Another idea I had was to serve LLM Lingua on the remote machine and simply request compressed messages. What do you think?

I would suggest abstracting the mechanism using a protocol and then implementing the protocol with local deployment. That way others can easily replace local with distributed deployment, you don't need to do it yourself.

Apr 03 '24 05:04 davorrunje

⚠️ GitGuardian has uncovered 5 secrets following the scan of your pull request.

Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.

🔎 Detected hardcoded secrets in your pull request

GitGuardian id	GitGuardian status	Secret	Commit	Filename
10404693	Triggered	Generic High Entropy Secret	73b80927eef01525ccf46302f6b71209c7ed8220	test/oai/test_utils.py	View secret
10404662	Triggered	Generic CLI Secret	d1e55c404c1b14fcd178433b107332fddfb84898	.github/workflows/dotnet-release.yml	View secret
10404694	Triggered	Generic High Entropy Secret	73b80927eef01525ccf46302f6b71209c7ed8220	test/oai/test_utils.py	View secret
10404696	Triggered	Generic High Entropy Secret	73b80927eef01525ccf46302f6b71209c7ed8220	test/oai/test_utils.py	View secret
10422482	Triggered	Generic High Entropy Secret	73b80927eef01525ccf46302f6b71209c7ed8220	test/oai/test_utils.py	View secret

🛠 Guidelines to remediate hardcoded secrets

Understand the implications of revoking this secret by investigating where it is used in your code.
Replace and store your secrets safely. Learn here the best practices.
Revoke and rotate these secrets.
If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.

To avoid such incidents in the future consider

following these best practices for managing and storing secrets including API keys and other credentials
install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.

^{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}

Apr 22 '24 16:04 gitguardian[bot]

@gagb @sonichi I found some time to finalize text compressions. I finalized the protocol and added a new user guide. When this PR gets merged, we can plan to target an autogen version to remove old code.

May 01 '24 17:05 WaelKarkoub

@marklysze Just added cache support, let me know what you think of the implementation

May 04 '24 00:05 WaelKarkoub

@marklysze Just added cache support, let me know what you think of the implementation

Wow, that was fast! I'll check it out :) much appreciated.

May 04 '24 01:05 marklysze

Just a question, not related specifically to text compression but to TransformMessages, is it possible to allow the passing in of a single transform as well as the dictionary of them? So, if they pass in just one it can be added to a dictionary and the code works as is.

E.g. context_handling = transform_messages.TransformMessages(transforms=text_compressor) where the _transforms checks if it's a dictionary and, if not, just puts the object into a dictionary

Just caters for users forgetting that it needs to be a dictionary (like me).

May 04 '24 05:05 marklysze

When using LLMLingua, is there any way to suppress the warning: Token indices sequence length is longer than the specified maximum sequence length for this model (521 > 512). Running this sequence through the model will result in indexing errors

I tried verbose=False on the TransformMessages but it still came through. I'm not we can control that?

May 04 '24 05:05 marklysze

What would be the best way to avoid text compression on certain messages?

E.g. for a debating scenario group chat, I added the TextCompression to the select speaker (auto) functionality and I noticed that the debate question: Please debate the proposition 'Cats make great pets. was being compressed as: debate proposition Cats pets.

This may just pass in the Select Speaker case but I can see that the user may want to avoid compressing key messages. My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long. Perhaps also the ability to ignore compressing where name='abc'/'def', etc.

There may be a better way to handle this but would like your thoughts on whether it should be built in or handled through a custom TransformMessage.

May 04 '24 06:05 marklysze

@marklysze The compressed text might not make sense to humans but it makes sense to an llm, at least that's what the research behind LLMLingua suggests. You can always improve the performance by using a larger model as well, like https://huggingface.co/microsoft/llmlingua-2-xlm-roberta-large-meetingbank. You can also implement a custom text compressor like the agent summarize you suggested and in the prompt ask it not to modify any key information.

My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long.

With the TransformMessages capability, the system prompt is ignored from any transformation by default: https://github.com/microsoft/autogen/blob/3a4bb088f70776b7a3ca847f487926b8d2e6619b/autogen/agentchat/contrib/capabilities/transform_messages.py#L68-L70

Perhaps also the ability to ignore compressing where name='abc'/'def', etc.

I agree with your suggestion. How about we add this to our backlog for now and revisit it after gathering more user feedback? I don't have a good sense of what users want from message text compressors as this feature hasn't been released yet

May 04 '24 13:05 WaelKarkoub

@marklysze Also just fyi, you can add custom instructions to llmlingua:

https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427

Which you could specify as an option in compression_args in the constructor of TextMessageCompressor

May 04 '24 13:05 WaelKarkoub

@marklysze The compressed text might not make sense to humans but it makes sense to an llm, at least that's what the research behind LLMLingua suggests. You can always improve the performance by using a larger model as well, like https://huggingface.co/microsoft/llmlingua-2-xlm-roberta-large-meetingbank. You can also implement a custom text compressor like the agent summarize you suggested and in the prompt ask it not to modify any key information.

My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long.

With the TransformMessages capability, the system prompt is ignored from any transformation by default:

https://github.com/microsoft/autogen/blob/3a4bb088f70776b7a3ca847f487926b8d2e6619b/autogen/agentchat/contrib/capabilities/transform_messages.py#L68-L70

Perhaps also the ability to ignore compressing where name='abc'/'def', etc.

I agree with your suggestion. How about we add this to our backlog for now and revisit it after gathering more user feedback? I don't have a good sense of what users want from message text compressors as this feature hasn't been released yet

Thanks @WaelKarkoub, I'll continue testing but I can see that it's changing agent names as well and that's going to be problematic. Perhaps a different model or different parameters can fix that for select speaker.

For example: Read the above conversation. Then select ONLY THE NAME of the next speaker from ['Debate_Moderator_Agent', 'Affirmative_Constructive_Debater', 'Negative_Constructive_Debater', 'Affirmative_Rebuttal_Debater', 'Negative_Rebuttal_Debater', 'Debate_Judge'] to speak. Do not explain why.

Is compressed to: Read above conversation select NAME next speaker Debate _ Moderator _ Agent Affirmative Constructive Debater Negative Debater Affirmative Rebuttal Debater Negative Debate _ Judge speak explain why

With agent names needing to be relatively precise for selection, we'll need to avoid compressing agent names.

Thanks for highlighting the if messages[0]["role"] == "system":, unfortunately for my test debating example, the prompt is not the first message, but the second, and hence is being compressed.

"[{'content': 'debate proposition Dogs better pets', 'role': 'user', 'name': 'Debate_Moderator_Agent'}, {'content': 'Read above conversation select NAME next speaker Debate _ Moderator _ Agent Affirmative Constructive Debater Negative Debater Affirmative Rebuttal Debater Negative Debate _ Judge speak explain why', 'role': 'system'}]"

I'll test with larger models as suggested and check if there are any parameters that can help.

May 04 '24 20:05 marklysze

@marklysze Also just fyi, you can add custom instructions to llmlingua:

https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427

Which you could specify as an option in compression_args in the constructor of TextMessageCompressor

Great! I see target_token and that sounds like something very useful. I'll give that a go.

May 04 '24 20:05 marklysze

@marklysze Also just fyi, you can add custom instructions to llmlingua: https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427 Which you could specify as an option in compression_args in the constructor of TextMessageCompressor

Great! I see target_token and that sounds like something very useful. I'll give that a go.

Just an update that using target_token was effective in targeting a specific token count for compression: text_compressor = TextMessageCompressor(text_compressor=LLMLingua(prompt_compressor_kwargs=compression_config),compression_args={"target_token": 10000})

Thanks @WaelKarkoub!

May 05 '24 03:05 marklysze

Just noticed that if a message content is empty, then I don't think it should check the cache or compress. So could we also check here if the content is '':

        for message in processed_messages:
            # Some messages may not have content.
            if not isinstance(message.get("content"), (str, list)):
                continue

Just a note that trying to get the cached value using an empty string was returning a non-empty string.

May 05 '24 03:05 marklysze

autogen autogen copied to clipboard

Text Compression Transform

Why are these changes needed?

Related issue number

Checks

Codecov Report

⚠️ GitGuardian has uncovered 5 secrets following the scan of your pull request.

autogen
autogen copied to clipboard