Jiayi Weng

Results 303 comments of Jiayi Weng

I go through the above discussion again. So if the environment produces all agent's step simultaneously and uses only one policy, **there's no need to use/follow MAPM**. Instead, treat this...

> * compute_episodic_return: change how m is calculated > * learn: change how ratio and u are calculated > * I would have to change every algorithm Yep, that's what...

> Just for clarification, according to your current idea, would I still need to change other parts, such as the forward pass of the neural network? None of them I...

> Or should I formulate the fixed agent as a part of the environment and train without multi-agent settings? This is a good approach in my opinion. You can switch...

Thanks for posting the code! I'll take a look this weekend. Btw, it seems I cannot run `watch_selfplay` correctly: ```bash Traceback (most recent call last): File "test_tic_tac_toe.py", line 22, in...

I know the issue. ```python test_collector = Collector(policy, test_envs, exploration_noise=True) ``` Sorry about that, I forgot to change this line in #280, will fix soon.

Btw, your provided example looks great! Are you interested in making a pull request to improve this example?

Where do you want to use this method?

> (may stack method in buffer can fullfill your need? ). Probably not. It needs to maintain each episode's start index and end index, customize `buffer.sample_index` (like `np.random.choice(np.nonzero(self.done)[0], batch_size)` will...

Currently we don't have such a plan, but you're welcome if you wish to submit a pull request!