ElegantRL
ElegantRL copied to clipboard
Massively Parallel Deep Reinforcement Learning. 🔥
@Yonv1943 Gym version=0.21 caused ```get_gym_env_args(gym.make("BipedalWalker-v3"), if_print=False)``` in the **Part 3** of [tutorial_BipedalWalker_v3.ipynb](https://github.com/AI4Finance-Foundation/ElegantRL/blob/98b83959cc6e62f17e7c2ad104c5050e84bd7297/tutorial_BipedalWalker_v3.ipynb) failed. Gym version 0.21 cannot register BipedalWalker-v3 and LunarLanderContinuous-v2, while version 0.17.0 worked fine. The install requirements [requirements.txt](https://github.com/AI4Finance-Foundation/ElegantRL/blob/98b83959cc6e62f17e7c2ad104c5050e84bd7297/requirements.txt)...
注意到在[PR#202](https://github.com/AI4Finance-Foundation/ElegantRL/pull/202)中为了加快速度,将`list()`替换成`[]`,但是将所有的`list(map())`都替换成了`[map()]`,例如[AgentBase.py](https://github.com/AI4Finance-Foundation/ElegantRL/blob/master/elegantrl/agents/AgentBase.py)第282行: ```python traj_list1 = [map(list, zip(*traj_list))] # state, reward, done, action, noise ``` 这不会引起迭代展开,而是生成了单个元素的list对象,如下: ```python >>> list(map(list, zip(*((1,2,3),(4,5,6))))) [[1, 4], [2, 5], [3, 6]] >>> [map(list, zip(*((1,2,3),(4,5,6))))] [] ```
* run.py * line 91: agent.state --> already set agent.states in init_agent. states is used instead of state. * line 109: buffer.update_buffer --> already done in line 107 * AgentBase.py...
In `obj_alpha = (self.alpha_log * (self.target_entropy - log_prob).detach()).mean()` when alpha_log=0, alpha will be 1forever. the correct way is `obj_alpha = (self.alpha * (self.target_entropy - log_prob).detach()).mean()` . this problem is also...
i can not find this function
Following is my code ` import elegantrl.agents.AgentMADDPG` `a1 = elegantrl.agents.AgentMADDPG.AgentMADDPG() ` Output: AgentMADDPG.py `in __init__ super().__init__()` `TypeError: __init__() missing 3 required positional arguments: 'net_dims', 'state_dim', and 'action_dim'` It is due...
 train_ppo_a2c_for_lunar_lander_continuous的ppo算法,好像不能完全复现曲线变化情况。 如果想完全复现曲线情况,不知道需不需要env.seed(args.random_seed)呢? 但是我尝试加了下env.seed(args.random_seed),好像起的作用不是很多大。 曲线不能完全浮现,不知道是不是因为多线程原因呢?还是别的原因呢?
File "/home/moderngangster/Codes/APC-Flight/ElegantRL/examples/../elegantrl/agents/AgentSAC.py", line 43, in update_net obj_critic, state = self.get_obj_critic(buffer, self.batch_size) File "/home/moderngangster/Codes/APC-Flight/ElegantRL/examples/../elegantrl/agents/AgentSAC.py", line 81, in get_obj_critic_per states, actions, rewards, undones, next_ss, is_weights, is_indices = buffer.sample_for_per(batch_size) File "/home/moderngangster/Codes/APC-Flight/ElegantRL/examples/../elegantrl/train/replay_buffer.py", line 134,...
How is it advised to continue training a model from a checkpoint `.pt` file? Able to build out a simple process for predicting using a Pytorch based structure, but I...
in the function `explore_vec_env` of `AgentPPO`, the variable `actions` shaped with `[horizon_len, self.num_envs, 1]`, but the following expression `convert(action)` return the tensor with the 1-dim shape `num_envs`, which actually should...