Support for TACO-RL
What does this PR do?
This PR introduces TACO-RL (Task-Aware Prompt Compression Optimization with Reinforcement Learning), a new submodule that extends LLMLingua with reinforcement learning capabilities for fine-tuning pre-trained models on new tasks using reward signals from language models like GPT-3.5.
Key Features Added
New TACO-RL Submodule
- Location:
llmlingua/taco-rl/- Main submodule withPromptCompressorReinforceclass - Experiments:
experiments/taco-rl/- Training scripts, utilities, and configuration files
Research Foundation
Based on the paper "TACO-RL: Task Aware Prompt Compression Optimization with Reinforcement Learning" (arXiv:2409.13035), this implementation addresses:
- Q1: How to design a prompt compression model that effectively leverages bidirectional context while providing low inference latency?
- Q2: How to efficiently train a model with proper guidance from task-specific reward signals while minimizing computational cost?
Directory Structure
llmlingua/taco-rl/
├── README.md # Main documentation
├── prompt_compressor_reinforce.py # RL-enhanced compressor class
└── __init__.py # Module initialization
experiments/taco-rl/
├── README.md # Training and implementation guide
├── train_reinforce.py # Main training script
├── utils.py # Utilities and API configuration
├── metrics.py # Evaluation metrics
├── configs/ # Configuration files
│ └── train_reinforce.yaml # Training configuration
└── logs/ # Training logs (created during training)
Usage Example
from llmlingua.taco_rl import PromptCompressorReinforce
# Load fine-tuned model
compressor = PromptCompressorReinforce(
model_name="path/to/fine_tuned_model",
use_llmlingua2=True
)
# Use for compression during training
compressed_prompt = compressor.compress_prompt_llmlingua2(
["Your prompt here..."],
rate=0.5
)
Dependencies Added
Core Dependencies
llmlingua(main package)
Additional Dependencies
pip install openai evaluate csv_logger hydra-core rouge_score
Documentation
- Main README: Overview, architecture, and integration guide
- Experiments README: Detailed training instructions, configuration examples, and troubleshooting
- API Configuration: User guide for setting up Azure OpenAI endpoints
- Evaluation: Links to existing evaluation framework in LLMLingua2 experiments
Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
- [ ] Was this discussed/approved via a Github issue? Please add a link to it if that's the case.
- [x] Did you make sure to update the documentation with your changes?
- [ ] Did you write any new necessary tests?
Who can review?
@iofu728
@shivamshan please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.
@microsoft-github-policy-service agree [company="{your company}"]Options:
- (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
- (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"Contributor License Agreement
@microsoft-github-policy-service agree company="Microsoft"