RLTrader
RLTrader copied to clipboard
Found Inf or NaN global norm. : Tensor had Inf values
While the optimize.py continue running, I observed one exception, but the process continue...
[W 2019-06-08 17:58:27,948] Setting status of trial#14 as TrialState.FAIL because of the following error: InvalidArgumentError()
Traceback (most recent call last):
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had Inf values
[[{{node loss/VerifyFinite/CheckNumerics}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/optuna/study.py", line 399, in _run_trial
result = func(trial)
File "optimize.py", line 88, in optimize_agent
model.learn(evaluation_interval)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 326, in learn
writer=writer, states=mb_states))
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 257, in _train_step
td_map)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Found Inf or NaN global norm. : Tensor had Inf values
[[node loss/VerifyFinite/CheckNumerics (defined at /home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py:175) ]]
Caused by op 'loss/VerifyFinite/CheckNumerics', defined at:
File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
self._bootstrap_inner()
File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
self.run()
File "/usr/lib64/python3.6/threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python3.6/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/optuna/study.py", line 357, in func_child_thread
self._run_trial(func, catch)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/optuna/study.py", line 399, in _run_trial
result = func(trial)
File "optimize.py", line 81, in optimize_agent
tensorboard_log="./tensorboard", **model_params)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 93, in __init__
self.setup_model()
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py", line 175, in setup_model
grads, _grad_norm = tf.clip_by_global_norm(grads, self.max_grad_norm)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/clip_ops.py", line 271, in clip_by_global_norm
"Found Inf or NaN global norm.")
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/numerics.py", line 44, in verify_tensor_all_finite
return verify_tensor_all_finite_v2(t, msg, name)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/numerics.py", line 62, in verify_tensor_all_finite_v2
verify_input = array_ops.check_numerics(x, message=message)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 919, in check_numerics
"CheckNumerics", tensor=tensor, message=message, name=name)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
op_def=op_def)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op
op_def=op_def)
File "/home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1801, in __init__
self._traceback = tf_stack.extract_stack()
InvalidArgumentError (see above for traceback): Found Inf or NaN global norm. : Tensor had Inf values
[[node loss/VerifyFinite/CheckNumerics (defined at /home/zangetsu/proj/prometheus-core/demo/demo-12-bitcoin-trading-agent/venv/lib/python3.6/site-packages/stable_baselines/ppo2/ppo2.py:175) ]]
I have also run into this error, but have found no success in debugging it. It is caused by Inf or NaN making it into the model's network, though I am unsure how, as the observation space, action space, and reward space all actively replace nan and abs(inf) with 0. Any ideas?
hm, need to do more tracing...
Hi,
We just release a guide in the documentation to tackle this type of issue. Feel free to open an issue on stable-baselines repo if you find something wrong coming from the library.
@archenroot has anyone debugged this anymore using VecCheckNan
from stable-baselines?