Chen Fang comments

Results 14 comments of


                                            Chen Fang

Help with questions about custom environments

Since you want to use the algorithm of MARLlib, I guess you may need to override the abstract class `MultiAgentEnv` provided by ray, or write a wrapper for the algorithm...

Doing Counterfactual Experience Replay

You can try overriding the `postprocess_fn` in PPOTOrchPolicy. In the MARLlib, such an example may be: https://github.com/Replicable-MARL/MARLlib/blob/368c6173577d0f9c0ad70fb5b4b6afa12c864c15/marllib/marl/algos/core/CC/coma.py#L116-L125 the signature for `postprocess_fn` is fixed, which is ``` postprocess_fn( policy: Policy, sample_batch:...

Doing Counterfactual Experience Replay

@nikhil-pitta it is called before both the policy gradient and value function gradient. The pipeline is basically: extra_action_out_fn → postprocess_fn → loss_fn → compute_gradients → apply_gradients.

Doing Counterfactual Experience Replay

@nikhil-pitta You mentioned "augment our current step/collected experiences and add to the replay buffer", and it sounds exactly what `postprocess_fn` do, as our earlier discussion. This extra function applies to...

Doing Counterfactual Experience Replay

@nikhil-pitta Note that `JointQPolicy` inherits the `Policy` class, which has a function `postprocess_trajectory`. https://github.com/ray-project/ray/blob/55fc0710d8472a9abaf244ed6567eb3b13136531/rllib/policy/policy.py#L361-L366. Directly overriding this function may help.

Doing Counterfactual Experience Replay

I remember joint Q learning supports `share_policy=all`, you can see the related logic at here. https://github.com/Replicable-MARL/MARLlib/blob/368c6173577d0f9c0ad70fb5b4b6afa12c864c15/marllib/marl/algos/run_vd.py#L105-L118 Try to adapt the code under this setting.

How to export trained model as a .pt (pytorch ) or ONNX model.

Although I do not know how Ray can do that directly, I tried to unwrap a Ray checkpoint and figured out its structure. First, load the raw checkpoint with `pickle.load`,...

Working with my own customized env

You can use an example from marllib/envs/base_env to check how a custom environment is wrapped for RLlib. I write my own environment according to such example. https://github.com/Replicable-MARL/MARLlib/blob/368c6173577d0f9c0ad70fb5b4b6afa12c864c15/marllib/envs/base_env/mpe.py#L94-L153

Problem with render function

The render function in MARLlib is a mocked version of fit(), which trains just one iteration and starts an evaluation iteration. I passed "lr=0" to mute any training progress for...

cannot train ma-gym environment with IQL

I met a similar issue. To solve it, import a different ENV_REGISTRY `from marllib.envs.global_reward_env import COOP_ENV_REGISTRY as ENV_REGISTRY` instead of `from marllib.envs.base_env import ENV_REGISTRY` This Is because IQL algorithm is...