Mava
Mava copied to clipboard
🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX
Title says it all. On TPUv3 try setting num_eval_episodes = 1
### Feature There have been many discussions to move our python tooling to [ruff](https://github.com/astral-sh/ruff), as seen in many popular libraries in the community. This will be a big once-off effort,...
### Please describe the purpose of the feature. Is it related to a problem? It would be very nice if the evaluator function was generic enough for future algorithms. It...
As discussed in #994 we should now be able to use the `MetricsWrapper` and the `TrainState` in the evaluator instead of manually recording the episode return and creating a special...
### Describe the bug Seem like we are not using the correct termination vs truncation values, we're always using the condition `termination or truncation` (`timestep.last()`) when we often want to...
## What? Implement Sebulba architecture with feedforward IPPO on Rware. ## Why? Integrate Sebulba's architecture due to its effectiveness in scenarios involving non-jitted/non-jax environments. ## How? Enhance the existing [Cleanba](https://github.com/vwxyzjn/cleanba)...
### Please describe what needs to be maintained? This came up in #955. Unfortunately it's quite messy to check the equality of two jaxmarl spaces as they don't have custom...
Having the `LogWrapper` and `LogEnvState` means that we often have to do `state.env_state` which is quite confusing. So just tracking the logging metrics separately in it's own state would make...
### Feature Because of the fact that we have a separate actor and critic, we need to update both networks separately, this is leading to a lot of code duplication,...
Previously it was not possible to use a CNN in Mava's recurrent systems. This PR makes CNNs compatible with recurrent systems and adds a relevant config for a recurrent CNN...