Tianhong Dai comments

Results 27 comments of


                                            Tianhong Dai

the worker can't forward in Actor-network

@jiameij Sorry for reply so late, it's the problem of thread blocking. I have solved that, need to add the `os.environ['OMP_NUM_THREADS'] = '1'`. I will revise it to pytorch-0.4.1 in...

Can I run the training in the handmanipulate env?

@LingfengTao Hi - Yes, you can use it to train the handmanipulation env.

about replay buffer

> 请问训练过程中的经验数据是在电脑硬盘还是存在内存？存在内存的话如果状态空间中有图像是不是很容易就存满了。 ---入门rl新人求解是存在内存上的；确实，如果状态空间有图像的话就很容易存满。如果必须得存图像的话，你可以尝试将图像信息存成uint8格式，尽量降低内存。

Why single process on Push not work

@Ericonaldo Hi, in actually, `MPI = a large batch size`. Could I know what is the batch size (a larger batch size) when you train the push task, please?

Why single process on Push not work

@Ericonaldo Hi - What I guess is because of the diversity of samples - before the agent updates the network, if you use single process, in each epoch, it will...

Why single process on Push not work

@Ericonaldo Hmm - that's a good point. An interesting finding is here: [https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22](https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22) . I follow the setting of OpenAI, they use `sum` instead of `avg` to gather the gradient...

Why single process on Push not work

@Ericonaldo Yes - it's quiet tricky of HER implementation...

Why single process on Push not work

@Ericonaldo I found that the `SUM` operator will influence the performance: [https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22](https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22) Here, instead of using `SUM`, I average the gradient according to the number of MPI workers as: ```python...

Why single process on Push not work

@Ericonaldo Yes - I agree, need to carry out more experiment to verify. We can use this channel to continue the discussion.

Why single process on Push not work

> I think the learning rates for both the policy network and the value network are important hyper-parameters for these goal-conditioned tasks, after fine-tune some values I found that with...