agent-lightning
agent-lightning copied to clipboard
Agent Lightning Backlog Tracker
This issue serves as a central backlog & roadmap tracker for Agent Lightning.
Our core members come from a research and engineering team at MSRA. As our team is still forming, we're currently severely understaffed and welcome new contributors. We encourage all Agent Lightning users to share your thoughts on backlog itemsβplease comment on which issues interest you most, including your priorities, preferences, and any suggestions you might have.
| Emoji | Status | Description |
|---|---|---|
| π‘ | Idea/Discovery | Needs more investigation or discussion. |
| π | Ready to Start | Scoped, prioritized, and ready for development . |
| πββοΈ | In progress | Someone is actively working on this. |
| π | In review/Testing/QA | Awaiting code review or in the qa and testing phase. |
| β | Blocked | Halted, waiting for an answer or a dependency. |
| β | Done | The task is complete. |
| β | Won't Do | This task has been cancelled. |
Core Stability
- β P0 - Bugfix for #37 #63 @hzy46
- π‘P0 - Bugfix for tracer can't get data from requests.post
- πP0 - Record rollout level reward before dropping trajectories @hzy46
Documentation and Examples
- πP0 - A framework-less example with Search-R1 #64 @SiyunZhao @hzy46 #147
- πββοΈP0 - Debug tutorial on the way
- πββοΈP1 - A tool selection example #65 @XufangLuo
New Features
-π‘P2 - Customizable triplet
Algorithms
- πββοΈP0 - Credit Assignment #31
Observability
- πββοΈP0 - Sending traces to AgentOps #43 @mydmdm
Backlog v0.3 - what's on my mind:
- Tinker support
- Azure OpenAI SFT support (cloud-sft branch)
- SqliteLightningStore
- Hao's improvement on tracer
- Online RL example
- VERL 0.6 and vllm 0.11 support
- Customizing AgentModeDaemon (probably needs refactor there)
- Switch to uv for dependency management
- Multi-modality example
- Merge Unsloth SFT trainer into algorithm zoo, and compare APO, VERL and SFT on calc-x
- Unify helper for: async in sync in async. unicorn server start.
- Support multi-prompts auto optimization.
- Collect human feedbacks within algorithms.
Support for VERL 0.6 would be great indeed!