Problem with running GAIL on HalfCheetah-v2
Hi, I am trying to run GAIL on HalfCheetah-v2 environment. In order to do so, I used the example of pendulum which exist on GAIL section of documentation. However I get following error when I use GAIL to create the model. it seems like there is type incompatibility in network. but I tried to convert all types of expert data set to float32 and the error is still there. I am wondering if you could help with this issue.
I am using the following versions:
tensorflow 1.14.0 gym 0.15.4 Mujoco 200 Ubuntu 18.10
ValueError Traceback (most recent call last) ~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords) 526 as_ref=input_arg.is_ref, --> 527 preferred_dtype=default_dtype) 528 except TypeError as err:
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx, accept_symbolic_tensors, accept_composite_tensors) 1223 if ret is None: -> 1224 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 1225
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref) 1017 "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" % -> 1018 (dtype.name, t.dtype.name, str(t))) 1019 return t
ValueError: Tensor conversion requested dtype float64 for Tensor with dtype float32: 'Tensor("adversary/obfilter/Cast:0", shape=(17,), dtype=float32)'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/model.py in init(self, policy, env, expert_dataset, hidden_size_adversary, adversary_entcoeff, g_step, d_step, d_stepsize, verbose, _init_setup_model, **kwargs) 47 48 if _init_setup_model: ---> 49 self.setup_model() 50 51 def learn(self, total_timesteps, callback=None, seed=None, log_interval=100, tb_log_name="GAIL",
~/anaconda3/lib/python3.7/site-packages/stable_baselines/trpo_mpi/trpo_mpi.py in setup_model(self) 124 self.reward_giver = TransitionClassifier(self.observation_space, self.action_space, 125 self.hidden_size_adversary, --> 126 entcoeff=self.adversary_entcoeff) 127 128 # Construct network for new policy
~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in init(self, observation_space, action_space, hidden_size, entcoeff, scope, normalize) 75 name="expert_actions_ph") 76 # Build graph ---> 77 generator_logits = self.build_graph(self.generator_obs_ph, self.generator_acs_ph, reuse=False) 78 expert_logits = self.build_graph(self.expert_obs_ph, self.expert_acs_ph, reuse=True) 79 # Build accuracy
~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in build_graph(self, obs_ph, acs_ph, reuse) 119 with tf.variable_scope("obfilter"): 120 self.obs_rms = RunningMeanStd(shape=self.observation_shape) --> 121 obs = (obs_ph - self.obs_rms.mean) / self.obs_rms.std 122 else: 123 obs = obs_ph
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y) 882 with ops.name_scope(None, op_name, [x, y]) as name: 883 if isinstance(x, ops.Tensor) and isinstance(y, ops.Tensor): --> 884 return func(x, y, name=name) 885 elif not isinstance(y, sparse_tensor.SparseTensor): 886 try:
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in sub(x, y, name) 10853 # Add nodes to the TensorFlow graph. 10854 _, _, _op = _op_def_lib._apply_op_helper(
10855 "Sub", x=x, y=y, name=name) 10856 _result = _op.outputs[:] 10857 _inputs_flat = _op.inputs
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords) 561 "%s type %s of argument '%s'." % 562 (prefix, dtypes.as_dtype(attrs[input_arg.type_attr]).name, --> 563 inferred_from[input_arg.type_attr])) 564 565 types = [values.dtype]
TypeError: Input 'y' of 'Sub' Op has type float32 that does not match type float64 of argument 'x'.
Hello, Please fill in the issue template completely (and format the code block / error stack using markdown, there is an example in the template).
EDIT: maybe related to https://github.com/hill-a/stable-baselines/issues/603
I am trying to run GAIL on HalfCheetah-v2 environment. In order to do so, I used the example of pendulum which exist on GAIL section of documentation. However I get following error when I use GAIL to create the model. it seems like there is type incompatibility in network. but I tried to convert all types of expert data set to float32 and the error is still there. I am wondering if you could help with this issue.
Code Example halfCheetah_gail.zip
Error messages and stack traces
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
526 as_ref=input_arg.is_ref,
--> 527 preferred_dtype=default_dtype)
528 except TypeError as err:
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx, accept_symbolic_tensors, accept_composite_tensors)
1223 if ret is None:
-> 1224 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
1225
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref)
1017 "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" %
-> 1018 (dtype.name, t.dtype.name, str(t)))
1019 return t
ValueError: Tensor conversion requested dtype float64 for Tensor with dtype float32: 'Tensor("adversary/obfilter/Cast:0", shape=(17,), dtype=float32)'
During handling of the above exception, another exception occurred:
TypeError Traceback (most recent call last)
<ipython-input-5-c677b4cecc61> in <module>
1 env = gym.make('HalfCheetah-v2')
2
----> 3 model = GAIL('MlpPolicy', env ,dataset, verbose=1)
~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/model.py in __init__(self, policy, env, expert_dataset, hidden_size_adversary, adversary_entcoeff, g_step, d_step, d_stepsize, verbose, _init_setup_model, **kwargs)
47
48 if _init_setup_model:
---> 49 self.setup_model()
50
51 def learn(self, total_timesteps, callback=None, seed=None, log_interval=100, tb_log_name="GAIL",
~/anaconda3/lib/python3.7/site-packages/stable_baselines/trpo_mpi/trpo_mpi.py in setup_model(self)
124 self.reward_giver = TransitionClassifier(self.observation_space, self.action_space,
125 self.hidden_size_adversary,
--> 126 entcoeff=self.adversary_entcoeff)
127
128 # Construct network for new policy
~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in __init__(self, observation_space, action_space, hidden_size, entcoeff, scope, normalize)
75 name="expert_actions_ph")
76 # Build graph
---> 77 generator_logits = self.build_graph(self.generator_obs_ph, self.generator_acs_ph, reuse=False)
78 expert_logits = self.build_graph(self.expert_obs_ph, self.expert_acs_ph, reuse=True)
79 # Build accuracy
~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in build_graph(self, obs_ph, acs_ph, reuse)
119 with tf.variable_scope("obfilter"):
120 self.obs_rms = RunningMeanStd(shape=self.observation_shape)
--> 121 obs = (obs_ph - self.obs_rms.mean) / self.obs_rms.std
122 else:
123 obs = obs_ph
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
882 with ops.name_scope(None, op_name, [x, y]) as name:
883 if isinstance(x, ops.Tensor) and isinstance(y, ops.Tensor):
--> 884 return func(x, y, name=name)
885 elif not isinstance(y, sparse_tensor.SparseTensor):
886 try:
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in sub(x, y, name)
10853 # Add nodes to the TensorFlow graph.
10854 _, _, _op = _op_def_lib._apply_op_helper(
> 10855 "Sub", x=x, y=y, name=name)
10856 _result = _op.outputs[:]
10857 _inputs_flat = _op.inputs
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
561 "%s type %s of argument '%s'." %
562 (prefix, dtypes.as_dtype(attrs[input_arg.type_attr]).name,
--> 563 inferred_from[input_arg.type_attr]))
564
565 types = [values.dtype]
TypeError: Input 'y' of 'Sub' Op has type float32 that does not match type float64 of argument 'x'.
System Info Python version 3.7 Tensorflow version 1.14.0 gym 0.15.4 Mujoco 200 Ubuntu 18.10
Hello, next time please use markdown and not a zip file. So minimal code to reproduce the error (I got a different one):
import pybullet_envs
from stable_baselines.gail import ExpertDataset, generate_expert_traj
from stable_baselines import GAIL, SAC
# env_id = 'Pendulum-v0' # works
# env_id = 'HalfCheetahBulletEnv-v0' # works
env_id = 'HalfCheetah-v2' # fail due to float64 to float32 conversion
expert_model = SAC('MlpPolicy', env_id, verbose=1)
traj_data = generate_expert_traj(expert_model, 'expert_model', n_episodes=3)
expert_dataset = ExpertDataset(traj_data=traj_data, sequential_preprocessing=True)
model = GAIL('MlpPolicy', env_id, expert_dataset=expert_dataset, verbose=1)
model.learn(1000)
Error:
ValueError: Tensor conversion requested dtype float64 for Tensor with dtype float32: 'Tensor("adversary/obfilter/Cast:0", shape=(17,), dtype=float32)'
It seems that the issue comes from float64 to float32 conversion. I don't know why but the dtype of the observation space is now float64 and not float32.
If you want to do a quick fix you can create an observation wrapper that converts the observation to float32.
Anyway, I would recommend you to use the imitation learning repo which is based on stable-baselines and actively maintain.
Hello @maryam-bandali I just fixed this issue just changing data type to float32 for reacher environment. Just go to environement .py file and probably in the observation code section just change it to float32.
def _get_obs(self): theta = self.sim.data.qpos.flat[:2] return np.concatenate( [ np.cos(theta), np.sin(theta), self.sim.data.qpos.flat[2:], self.sim.data.qvel.flat[:2], self.get_body_com("fingertip") - self.get_body_com("target"), ] ).astype('float32')