Support for TACO-RL

Open shivamshan opened this issue 6 months ago • 1 comments

What does this PR do?

This PR introduces TACO-RL (Task-Aware Prompt Compression Optimization with Reinforcement Learning), a new submodule that extends LLMLingua with reinforcement learning capabilities for fine-tuning pre-trained models on new tasks using reward signals from language models like GPT-3.5.

Key Features Added

New TACO-RL Submodule

Location: llmlingua/taco-rl/ - Main submodule with PromptCompressorReinforce class
Experiments: experiments/taco-rl/ - Training scripts, utilities, and configuration files

Research Foundation

Based on the paper "TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning" (arXiv:2409.13035), this implementation addresses:

Q1: How to design a prompt compression model that effectively leverages bidirectional context while providing low inference latency?
Q2: How to efficiently train a model with proper guidance from task-specific reward signals while minimizing computational cost?

Directory Structure

llmlingua/taco-rl/
├── README.md                     # Main documentation
├── prompt_compressor_reinforce.py  # RL-enhanced compressor class
└── __init__.py                   # Module initialization

experiments/taco-rl/
├── README.md                     # Training and implementation guide
├── train_reinforce.py            # Main training script
├── utils.py                      # Utilities and API configuration
├── metrics.py                    # Evaluation metrics
├── configs/                      # Configuration files
│   └── train_reinforce.yaml      # Training configuration
└── logs/                         # Training logs (created during training)

Usage Example

from llmlingua.taco_rl import PromptCompressorReinforce

# Load fine-tuned model
compressor = PromptCompressorReinforce(
    model_name="path/to/fine_tuned_model",
    use_llmlingua2=True
)

# Use for compression during training
compressed_prompt = compressor.compress_prompt_llmlingua2(
    ["Your prompt here..."],
    rate=0.5
)

Dependencies Added

Core Dependencies

llmlingua (main package)

Additional Dependencies

pip install openai evaluate csv_logger hydra-core rouge_score

Documentation

Main README: Overview, architecture, and integration guide
Experiments README: Detailed training instructions, configuration examples, and troubleshooting
API Configuration: User guide for setting up Azure OpenAI endpoints
Evaluation: Links to existing evaluation framework in LLMLingua2 experiments

Before submitting

[ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
[x] Did you make sure to update the documentation with your changes?
[ ] Did you write any new necessary tests?

Who can review?

@iofu728

Jul 01 '25 20:07 shivamshan

@shivamshan please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]
Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

@microsoft-github-policy-service agree company="Microsoft"

Jul 01 '25 20:07 shivamshan