openrl icon indicating copy to clipboard operation
openrl copied to clipboard

Unified Reinforcement Learning Framework

Results 20 openrl issues
Sort by recently updated
recently updated
newest added

### πŸš€ Feature [Feature Request] selfplay support more than two players ### Motivation _No response_ ### Additional context _No response_ ### Checklist - [X] I have checked that there is...

enhancement

### πŸ“š Documentation add introduction to OpenRL Wrappers ### Checklist - [X] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues) in the repo - [X] I have read the...

documentation

### πŸš€ Feature [Feature Request] Add AWR algorithm ### Motivation _No response_ ### Additional context _No response_ ### Checklist - [X] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues)...

enhancement

### πŸš€ Feature add QMIX ### Motivation _No response_ ### Additional context _No response_ ### Checklist - [X] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues) in the repo...

enhancement

### πŸš€ Feature Add vdn algorithm, including vdn_net, vdn_module, etc. ### Motivation No response ### Additional context No response ### Checklist - [X] I have checked that there is no...

enhancement

### πŸš€ Feature - Add CPU number check to make function. - if the user wants to allocate larger environment number than their CPU numbers during asynchronous mode, raise the...

enhancement

### πŸ› Bug agent.save() is not well implemented. The saved file for nlp task is too large. ### To Reproduce ```python from openrl import ... ``` ### Relevant log output...

bug

### πŸ“š Documentation I'm rather confused about the use of wrappers, and would like a tutorial to explain the use of the individual wrappers. ### Checklist - [X] I have...

documentation

### ❓ Question ζ€ŽδΉˆζ”―ζŒε€šgpu? ### Checklist - [x] I have checked that there is no similar [issues](https://github.com/OpenRL-Lab/openrl/issues) in the repo - [x] I have read the [documentation](https://openrl-docs.readthedocs.io/)

question

I'm thinking it should be possible to use the VLM as the policy and evaluation, just with different prompts. I'm trying to use Qwen2.5-VL-3B-Instruct as basis to create an agent...