RL-Bitcoin-trading-bot icon indicating copy to clipboard operation
RL-Bitcoin-trading-bot copied to clipboard

Gaes fees

Open crazypythonista opened this issue 2 years ago • 3 comments

Hello, I was trying to work this out on my end from scratch, I have got it to the point of training the model and also visualize but it seems to drop in the middle of the training session without saving the model.

VC: Python : 3.8.10 tensorflow = 2.3.1 Windows = 11 No IDLE, Using script mode from windows power shell virtual env.

Below is the complete Traceback of the error I received.

2022-03-07 04:17:43.095316: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1257] Device interconnect StreamExecutor with strength 1 edge matrix: 2022-03-07 04:17:43.100610: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1263] Traceback (most recent call last): File "RL-Bitcoin-trading-bot_7.py", line 501, in train_multiprocessing(CustomEnv, agent, train_df, train_df_nomalized, num_worker = 5, training_batch_size=50, visualize=True, EPISODES=5) File "D:\Mine\RLCurrent\multiprocessing_env.py", line 95, in train_multiprocessing a_loss, c_loss = agent.replay(states[worker_id], actions[worker_id], rewards[worker_id], predictions[worker_id], dones[worker_id], next_states[worker_id]) File "RL-Bitcoin-trading-bot_7.py", line 121, in replay advantages, target = self.get_gaes(rewards, dones, np.squeeze(values), np.squeeze(next_values)) File "RL-Bitcoin-trading-bot_7.py", line 93, in get_gaes deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)] File "RL-Bitcoin-trading-bot_7.py", line 93, in deltas = [r + gamma * (1 - d) * nv - v for r, d, nv, v in zip(rewards, dones, next_values, values)] TypeError: unsupported operand type(s) for +: 'NoneType' and 'float'

Any sort of help is highly appreciated. If needed I'll post code snippets as well for more clarity. Thanks.

crazypythonista avatar Mar 06 '22 22:03 crazypythonista

This is a duplicate of #18

HoaxParagon avatar Mar 06 '22 23:03 HoaxParagon

Also a duplicate of #9

HoaxParagon avatar Mar 07 '22 14:03 HoaxParagon

Hey I think the problem is originated from the output of critic_predict. I guess that in the original PPO function implemented by the writer has included "Critic model also watched the previous predicted value", but he removed it in this tutorial. That means critic model doesn't check previous value input now. Maybe you should try removing the np.zero input in critirc_predict function.

wanga10000 avatar Mar 14 '22 07:03 wanga10000