autogen
autogen copied to clipboard
Text Compression Transform
Why are these changes needed?
This PR introduces text compression by leveraging the LLMLingua library. This addition enhances processing efficiency and response speed by reducing token usage in large language models.
NOTE: LLM lingua uses locally hosted models, so caching might be important here.
Future work:
- Image Compression
- Video Compression
Related issue number
Closes #2538
Checks
- [ ] I've included any doc changes needed for https://microsoft.github.io/autogen/. See https://microsoft.github.io/autogen/docs/Contribute#documentation to build and test documentation locally.
- [x] I've added tests (if relevant) corresponding to the changes introduced in this PR.
- [x] I've made sure all auto checks have passed.
Codecov Report
Attention: Patch coverage is 25.96154% with 77 lines in your changes are missing coverage. Please review.
Project coverage is 45.11%. Comparing base (
ded2d61) to head (ec6fe57). Report is 35 commits behind head on main.
| Files | Patch % | Lines |
|---|---|---|
| ...togen/agentchat/contrib/capabilities/transforms.py | 20.23% | 67 Missing :warning: |
| ...agentchat/contrib/capabilities/text_compressors.py | 50.00% | 10 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #2225 +/- ##
===========================================
+ Coverage 33.33% 45.11% +11.77%
===========================================
Files 83 86 +3
Lines 8636 9108 +472
Branches 1835 2090 +255
===========================================
+ Hits 2879 4109 +1230
+ Misses 5516 4651 -865
- Partials 241 348 +107
| Flag | Coverage Δ | |
|---|---|---|
| unittest | 12.61% <25.96%> (?) |
|
| unittests | 44.36% <0.00%> (+11.03%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@sonichi I'll open a PR to handle deprecation and add a topic. The plan is to merge the deprecation PR before this one to simplify the workflow
There should be an option of not running the LLMLingua on the same machine. When deployed on servers, we should be able to run GPU-bound tasks on specialized harder and not on the same machine on which we run agents that are IO-bound tasks.
Hi @davorrunje, do you have any recommendations for offloading Python workloads to remote machines? I don't have much experience with it, although I do have some experience with Ray. I'm not sure if I can implement it, though. Another idea I had was to serve LLM Lingua on the remote machine and simply request compressed messages. What do you think?
Hi @davorrunje, do you have any recommendations for offloading Python workloads to remote machines? I don't have much experience with it, although I do have some experience with Ray. I'm not sure if I can implement it, though. Another idea I had was to serve LLM Lingua on the remote machine and simply request compressed messages. What do you think?
I would suggest abstracting the mechanism using a protocol and then implementing the protocol with local deployment. That way others can easily replace local with distributed deployment, you don't need to do it yourself.
⚠️ GitGuardian has uncovered 5 secrets following the scan of your pull request.
Please consider investigating the findings and remediating the incidents. Failure to do so may lead to compromising the associated services or software components.
🔎 Detected hardcoded secrets in your pull request
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 10404693 | Triggered | Generic High Entropy Secret | 73b80927eef01525ccf46302f6b71209c7ed8220 | test/oai/test_utils.py | View secret |
| 10404662 | Triggered | Generic CLI Secret | d1e55c404c1b14fcd178433b107332fddfb84898 | .github/workflows/dotnet-release.yml | View secret |
| 10404694 | Triggered | Generic High Entropy Secret | 73b80927eef01525ccf46302f6b71209c7ed8220 | test/oai/test_utils.py | View secret |
| 10404696 | Triggered | Generic High Entropy Secret | 73b80927eef01525ccf46302f6b71209c7ed8220 | test/oai/test_utils.py | View secret |
| 10422482 | Triggered | Generic High Entropy Secret | 73b80927eef01525ccf46302f6b71209c7ed8220 | test/oai/test_utils.py | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secrets safely. Learn here the best practices.
- Revoke and rotate these secrets.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
@gagb @sonichi I found some time to finalize text compressions. I finalized the protocol and added a new user guide. When this PR gets merged, we can plan to target an autogen version to remove old code.
@marklysze Just added cache support, let me know what you think of the implementation
@marklysze Just added cache support, let me know what you think of the implementation
Wow, that was fast! I'll check it out :) much appreciated.
Just a question, not related specifically to text compression but to TransformMessages, is it possible to allow the passing in of a single transform as well as the dictionary of them? So, if they pass in just one it can be added to a dictionary and the code works as is.
E.g.
context_handling = transform_messages.TransformMessages(transforms=text_compressor)
where the _transforms checks if it's a dictionary and, if not, just puts the object into a dictionary
Just caters for users forgetting that it needs to be a dictionary (like me).
When using LLMLingua, is there any way to suppress the warning:
Token indices sequence length is longer than the specified maximum sequence length for this model (521 > 512). Running this sequence through the model will result in indexing errors
I tried verbose=False on the TransformMessages but it still came through. I'm not we can control that?
What would be the best way to avoid text compression on certain messages?
E.g. for a debating scenario group chat, I added the TextCompression to the select speaker (auto) functionality and I noticed that the debate question:
Please debate the proposition 'Cats make great pets.
was being compressed as:
debate proposition Cats pets.
This may just pass in the Select Speaker case but I can see that the user may want to avoid compressing key messages. My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long. Perhaps also the ability to ignore compressing where name='abc'/'def', etc.
There may be a better way to handle this but would like your thoughts on whether it should be built in or handled through a custom TransformMessage.
@marklysze The compressed text might not make sense to humans but it makes sense to an llm, at least that's what the research behind LLMLingua suggests. You can always improve the performance by using a larger model as well, like https://huggingface.co/microsoft/llmlingua-2-xlm-roberta-large-meetingbank. You can also implement a custom text compressor like the agent summarize you suggested and in the prompt ask it not to modify any key information.
My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long.
With the TransformMessages capability, the system prompt is ignored from any transformation by default:
https://github.com/microsoft/autogen/blob/3a4bb088f70776b7a3ca847f487926b8d2e6619b/autogen/agentchat/contrib/capabilities/transform_messages.py#L68-L70
Perhaps also the ability to ignore compressing where name='abc'/'def', etc.
I agree with your suggestion. How about we add this to our backlog for now and revisit it after gathering more user feedback? I don't have a good sense of what users want from message text compressors as this feature hasn't been released yet
@marklysze Also just fyi, you can add custom instructions to llmlingua:
https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427
Which you could specify as an option in compression_args in the constructor of TextMessageCompressor
@marklysze The compressed text might not make sense to humans but it makes sense to an llm, at least that's what the research behind LLMLingua suggests. You can always improve the performance by using a larger model as well, like https://huggingface.co/microsoft/llmlingua-2-xlm-roberta-large-meetingbank. You can also implement a custom text compressor like the agent summarize you suggested and in the prompt ask it not to modify any key information.
My initial thoughts are that it would be good to be able to provide the option to ignore compressing role='system' prompts as they are really crucial and not generally that long.
With the
TransformMessagescapability, the system prompt is ignored from any transformation by default:https://github.com/microsoft/autogen/blob/3a4bb088f70776b7a3ca847f487926b8d2e6619b/autogen/agentchat/contrib/capabilities/transform_messages.py#L68-L70
Perhaps also the ability to ignore compressing where name='abc'/'def', etc.
I agree with your suggestion. How about we add this to our backlog for now and revisit it after gathering more user feedback? I don't have a good sense of what users want from message text compressors as this feature hasn't been released yet
Thanks @WaelKarkoub, I'll continue testing but I can see that it's changing agent names as well and that's going to be problematic. Perhaps a different model or different parameters can fix that for select speaker.
For example:
Read the above conversation. Then select ONLY THE NAME of the next speaker from ['Debate_Moderator_Agent', 'Affirmative_Constructive_Debater', 'Negative_Constructive_Debater', 'Affirmative_Rebuttal_Debater', 'Negative_Rebuttal_Debater', 'Debate_Judge'] to speak. Do not explain why.
Is compressed to:
Read above conversation select NAME next speaker Debate _ Moderator _ Agent Affirmative Constructive Debater Negative Debater Affirmative Rebuttal Debater Negative Debate _ Judge speak explain why
With agent names needing to be relatively precise for selection, we'll need to avoid compressing agent names.
Thanks for highlighting the if messages[0]["role"] == "system":, unfortunately for my test debating example, the prompt is not the first message, but the second, and hence is being compressed.
"[{'content': 'debate proposition Dogs better pets', 'role': 'user', 'name': 'Debate_Moderator_Agent'}, {'content': 'Read above conversation select NAME next speaker Debate _ Moderator _ Agent Affirmative Constructive Debater Negative Debater Affirmative Rebuttal Debater Negative Debate _ Judge speak explain why', 'role': 'system'}]"
I'll test with larger models as suggested and check if there are any parameters that can help.
@marklysze Also just fyi, you can add custom instructions to llmlingua:
https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427
Which you could specify as an option in
compression_argsin the constructor ofTextMessageCompressor
Great! I see target_token and that sounds like something very useful. I'll give that a go.
@marklysze Also just fyi, you can add custom instructions to llmlingua: https://github.com/microsoft/LLMLingua/blob/40ac969a82f162b3eb0b8e1f1416756d442e4eec/llmlingua/prompt_compressor.py#L424-L427 Which you could specify as an option in
compression_argsin the constructor ofTextMessageCompressorGreat! I see
target_tokenand that sounds like something very useful. I'll give that a go.
Just an update that using target_token was effective in targeting a specific token count for compression:
text_compressor = TextMessageCompressor(text_compressor=LLMLingua(prompt_compressor_kwargs=compression_config),compression_args={"target_token": 10000})
Thanks @WaelKarkoub!
Just noticed that if a message content is empty, then I don't think it should check the cache or compress. So could we also check here if the content is '':
for message in processed_messages:
# Some messages may not have content.
if not isinstance(message.get("content"), (str, list)):
continue
Just a note that trying to get the cached value using an empty string was returning a non-empty string.