openrl
openrl copied to clipboard
Unified Reinforcement Learning Framework
### π Feature [Feature Request] selfplay support more than two players ### Motivation _No response_ ### Additional context _No response_ ### Checklist - [X] I have checked that there is...
### π Documentation add introduction to OpenRL Wrappers ### Checklist - [X] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues) in the repo - [X] I have read the...
### π Feature [Feature Request] Add AWR algorithm ### Motivation _No response_ ### Additional context _No response_ ### Checklist - [X] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues)...
### π Feature add QMIX ### Motivation _No response_ ### Additional context _No response_ ### Checklist - [X] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues) in the repo...
### π Feature Add vdn algorithm, including vdn_net, vdn_module, etc. ### Motivation No response ### Additional context No response ### Checklist - [X] I have checked that there is no...
### π Feature - Add CPU number check to make function. - if the user wants to allocate larger environment number than their CPU numbers during asynchronous mode, raise the...
### π Bug agent.save() is not well implemented. The saved file for nlp task is too large. ### To Reproduce ```python from openrl import ... ``` ### Relevant log output...
### π Documentation I'm rather confused about the use of wrappers, and would like a tutorial to explain the use of the individual wrappers. ### Checklist - [X] I have...
### β Question ζδΉζ―ζε€gpu? ### Checklist - [x] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues) in the repo - [x] I have read the [documentation](https://openrl-docs.readthedocs.io/)
I'm thinking it should be possible to use the VLM as the policy and evaluation, just with different prompts. I'm trying to use Qwen2.5-VL-3B-Instruct as basis to create an agent...