Mava icon indicating copy to clipboard operation
Mava copied to clipboard

🦁 A research-friendly codebase for fast experimentation of multi-agent reinforcement learning in JAX

Results 160 Mava issues
Sort by recently updated
recently updated
newest added

### Describe the bug Jax system logs look different compared to tf system logs. ### To Reproduce Steps to reproduce the behavior: 1. Run jax example. 2. Run tf example....

bug

### Please describe the purpose of the feature. Is it related to a problem? This is a nitpick issue which might not be needed at all. But if we can...

enhancement

### Describe the bug The current flatland docker image doesn't work. Error: `module 'jaxlib.xla_extension' has no attribute '__path__'` due to version of cloudpickle that is required by gym 0.14 (flatland...

bug

### What do you want to investigate? MADDPG/ MAD4PG are both significantly slower than MAPPO in certain instances. 1. Run MADDPG on Coop pong/ PCB Grid for n steps with...

bug

### What do you want to investigate? Updating DDPG with only the critic network yields better performance. For this to be carried out, we need to update the DDPG actor...

### Feature For a small performance boost we should consider moving the tf.functions from [_minibatch_update](https://github.com/instadeepai/Mava/blob/develop/mava/systems/tf/mappo/training.py#L274) to [_step](https://github.com/instadeepai/Mava/blob/develop/mava/systems/tf/mappo/training.py#L285). ### Testing Validate that the trainer is running faster with this change. To...

enhancement

### Describe the bug In the MADDPG system, regardless of the net spec keys given to the network_factory (i.e create_default_networks), the system overwrites these values. This causes issues when trying...

bug

### Feature A piecewise linear scheduler for epsilon. With piecewise linear scheduler the user can increase and decrease the epsilon over the desired time intervals. ### Proposal Creating a new...

enhancement

### Describe the bug The executors in MADDPG pass no `set_keys` argument to the variable client. This means that the `set_keys` defaults to the `get_keys`. We don't want the executors...

bug

### Describe the bug This makes loading from a checkpoint and resuming training problematic as new optimizers are initialised each time. ### Solution Add the optimizer states to the variable...

bug