Tianhong Dai
Tianhong Dai
@jiameij Sorry for reply so late, it's the problem of thread blocking. I have solved that, need to add the `os.environ['OMP_NUM_THREADS'] = '1'`. I will revise it to pytorch-0.4.1 in...
@LingfengTao Hi - Yes, you can use it to train the handmanipulation env.
> 请问训练过程中的经验数据是在电脑硬盘还是存在内存?存在内存的话如果状态空间中有图像是不是很容易就存满了。 ---入门rl新人求解 是存在内存上的;确实,如果状态空间有图像的话就很容易存满。如果必须得存图像的话,你可以尝试将图像信息存成uint8格式,尽量降低内存。
@Ericonaldo Hi, in actually, `MPI = a large batch size`. Could I know what is the batch size (a larger batch size) when you train the push task, please?
@Ericonaldo Hi - What I guess is because of the diversity of samples - before the agent updates the network, if you use single process, in each epoch, it will...
@Ericonaldo Hmm - that's a good point. An interesting finding is here: [https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22](https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22) . I follow the setting of OpenAI, they use `sum` instead of `avg` to gather the gradient...
@Ericonaldo Yes - it's quiet tricky of HER implementation...
@Ericonaldo I found that the `SUM` operator will influence the performance: [https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22](https://github.com/TianhongDai/hindsight-experience-replay/blob/master/mpi_utils/mpi_utils.py#L21-L22) Here, instead of using `SUM`, I average the gradient according to the number of MPI workers as: ```python...
@Ericonaldo Yes - I agree, need to carry out more experiment to verify. We can use this channel to continue the discussion.
> I think the learning rates for both the policy network and the value network are important hyper-parameters for these goal-conditioned tasks, after fine-tune some values I found that with...