stable-baselines Problem with running GAIL on HalfCheetah-v2

Hi, I am trying to run GAIL on HalfCheetah-v2 environment. In order to do so, I used the example of pendulum which exist on GAIL section of documentation. However I‌ get following error when I use GAIL to create the model. it seems like there is type incompatibility in network. but I tried to convert all types of expert data set to float32 and the error is still there. I am wondering if you could help with this issue.

I am using the following versions:

tensorflow 1.14.0 gym 0.15.4 Mujoco 200 Ubuntu 18.10

ValueError Traceback (most recent call last) ~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords) 526 as_ref=input_arg.is_ref, --> 527 preferred_dtype=default_dtype) 528 except TypeError as err:

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx, accept_symbolic_tensors, accept_composite_tensors) 1223 if ret is None: -> 1224 ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref) 1225

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref) 1017 "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" % -> 1018 (dtype.name, t.dtype.name, str(t))) 1019 return t

ValueError: Tensor conversion requested dtype float64 for Tensor with dtype float32: 'Tensor("adversary/obfilter/Cast:0", shape=(17,), dtype=float32)'

During handling of the above exception, another exception occurred:

TypeError Traceback (most recent call last) in 1 env = gym.make('HalfCheetah-v2') 2 ----> 3 model = GAIL('MlpPolicy', env ,dataset, verbose=1)

~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/model.py in init(self, policy, env, expert_dataset, hidden_size_adversary, adversary_entcoeff, g_step, d_step, d_stepsize, verbose, _init_setup_model, **kwargs) 47 48 if _init_setup_model: ---> 49 self.setup_model() 50 51 def learn(self, total_timesteps, callback=None, seed=None, log_interval=100, tb_log_name="GAIL",

~/anaconda3/lib/python3.7/site-packages/stable_baselines/trpo_mpi/trpo_mpi.py in setup_model(self) 124 self.reward_giver = TransitionClassifier(self.observation_space, self.action_space, 125 self.hidden_size_adversary, --> 126 entcoeff=self.adversary_entcoeff) 127 128 # Construct network for new policy

~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in init(self, observation_space, action_space, hidden_size, entcoeff, scope, normalize) 75 name="expert_actions_ph") 76 # Build graph ---> 77 generator_logits = self.build_graph(self.generator_obs_ph, self.generator_acs_ph, reuse=False) 78 expert_logits = self.build_graph(self.expert_obs_ph, self.expert_acs_ph, reuse=True) 79 # Build accuracy

~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in build_graph(self, obs_ph, acs_ph, reuse) 119 with tf.variable_scope("obfilter"): 120 self.obs_rms = RunningMeanStd(shape=self.observation_shape) --> 121 obs = (obs_ph - self.obs_rms.mean) / self.obs_rms.std 122 else: 123 obs = obs_ph

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y) 882 with ops.name_scope(None, op_name, [x, y]) as name: 883 if isinstance(x, ops.Tensor) and isinstance(y, ops.Tensor): --> 884 return func(x, y, name=name) 885 elif not isinstance(y, sparse_tensor.SparseTensor): 886 try:

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in sub(x, y, name) 10853 # Add nodes to the TensorFlow graph. 10854 _, _, _op = _op_def_lib._apply_op_helper(

10855 "Sub", x=x, y=y, name=name) 10856 _result = _op.outputs[:] 10857 _inputs_flat = _op.inputs

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords) 561 "%s type %s of argument '%s'." % 562 (prefix, dtypes.as_dtype(attrs[input_arg.type_attr]).name, --> 563 inferred_from[input_arg.type_attr])) 564 565 types = [values.dtype]

TypeError: Input 'y' of 'Sub' Op has type float32 that does not match type float64 of argument 'x'.

Jan 02 '20 11:01 maryam-bandali

Hello, Please fill in the issue template completely (and format the code block / error stack using markdown, there is an example in the template).

EDIT: maybe related to https://github.com/hill-a/stable-baselines/issues/603

Jan 02 '20 11:01 araffin

I am trying to run GAIL on HalfCheetah-v2 environment. In order to do so, I used the example of pendulum which exist on GAIL section of documentation. However I‌ get following error when I use GAIL to create the model. it seems like there is type incompatibility in network. but I tried to convert all types of expert data set to float32 and the error is still there. I am wondering if you could help with this issue.

Code Example halfCheetah_gail.zip

Error messages and stack traces

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
    526                 as_ref=input_arg.is_ref,
--> 527                 preferred_dtype=default_dtype)
    528           except TypeError as err:

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in internal_convert_to_tensor(value, dtype, name, as_ref, preferred_dtype, ctx, accept_symbolic_tensors, accept_composite_tensors)
   1223     if ret is None:
-> 1224       ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
   1225 

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/ops.py in _TensorTensorConversionFunction(t, dtype, name, as_ref)
   1017         "Tensor conversion requested dtype %s for Tensor with dtype %s: %r" %
-> 1018         (dtype.name, t.dtype.name, str(t)))
   1019   return t

ValueError: Tensor conversion requested dtype float64 for Tensor with dtype float32: 'Tensor("adversary/obfilter/Cast:0", shape=(17,), dtype=float32)'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
<ipython-input-5-c677b4cecc61> in <module>
      1 env = gym.make('HalfCheetah-v2')
      2 
----> 3 model = GAIL('MlpPolicy', env ,dataset, verbose=1)

~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/model.py in __init__(self, policy, env, expert_dataset, hidden_size_adversary, adversary_entcoeff, g_step, d_step, d_stepsize, verbose, _init_setup_model, **kwargs)
     47 
     48         if _init_setup_model:
---> 49             self.setup_model()
     50 
     51     def learn(self, total_timesteps, callback=None, seed=None, log_interval=100, tb_log_name="GAIL",

~/anaconda3/lib/python3.7/site-packages/stable_baselines/trpo_mpi/trpo_mpi.py in setup_model(self)
    124                     self.reward_giver = TransitionClassifier(self.observation_space, self.action_space,
    125                                                              self.hidden_size_adversary,
--> 126                                                              entcoeff=self.adversary_entcoeff)
    127 
    128                 # Construct network for new policy

~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in __init__(self, observation_space, action_space, hidden_size, entcoeff, scope, normalize)
     75                                             name="expert_actions_ph")
     76         # Build graph
---> 77         generator_logits = self.build_graph(self.generator_obs_ph, self.generator_acs_ph, reuse=False)
     78         expert_logits = self.build_graph(self.expert_obs_ph, self.expert_acs_ph, reuse=True)
     79         # Build accuracy

~/anaconda3/lib/python3.7/site-packages/stable_baselines/gail/adversary.py in build_graph(self, obs_ph, acs_ph, reuse)
    119                 with tf.variable_scope("obfilter"):
    120                     self.obs_rms = RunningMeanStd(shape=self.observation_shape)
--> 121                 obs = (obs_ph - self.obs_rms.mean) / self.obs_rms.std
    122             else:
    123                 obs = obs_ph

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py in binary_op_wrapper(x, y)
    882     with ops.name_scope(None, op_name, [x, y]) as name:
    883       if isinstance(x, ops.Tensor) and isinstance(y, ops.Tensor):
--> 884         return func(x, y, name=name)
    885       elif not isinstance(y, sparse_tensor.SparseTensor):
    886         try:

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/gen_math_ops.py in sub(x, y, name)
  10853   # Add nodes to the TensorFlow graph.
  10854   _, _, _op = _op_def_lib._apply_op_helper(
> 10855         "Sub", x=x, y=y, name=name)
  10856   _result = _op.outputs[:]
  10857   _inputs_flat = _op.inputs

~/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py in _apply_op_helper(self, op_type_name, name, **keywords)
    561                   "%s type %s of argument '%s'." %
    562                   (prefix, dtypes.as_dtype(attrs[input_arg.type_attr]).name,
--> 563                    inferred_from[input_arg.type_attr]))
    564 
    565           types = [values.dtype]

TypeError: Input 'y' of 'Sub' Op has type float32 that does not match type float64 of argument 'x'.

System Info Python version 3.7 Tensorflow version 1.14.0 gym 0.15.4 Mujoco 200 Ubuntu 18.10

Jan 02 '20 14:01 maryam-bandali

Hello, next time please use markdown and not a zip file. So minimal code to reproduce the error (I got a different one):


import pybullet_envs

from stable_baselines.gail import ExpertDataset, generate_expert_traj
from stable_baselines import GAIL, SAC

# env_id = 'Pendulum-v0'  # works
# env_id = 'HalfCheetahBulletEnv-v0' # works
env_id = 'HalfCheetah-v2'  # fail due to float64 to float32 conversion

expert_model = SAC('MlpPolicy', env_id, verbose=1)
traj_data = generate_expert_traj(expert_model, 'expert_model', n_episodes=3)

expert_dataset = ExpertDataset(traj_data=traj_data, sequential_preprocessing=True)

model = GAIL('MlpPolicy', env_id, expert_dataset=expert_dataset, verbose=1)

model.learn(1000)

Error:

ValueError: Tensor conversion requested dtype float64 for Tensor with dtype float32: 'Tensor("adversary/obfilter/Cast:0", shape=(17,), dtype=float32)'

It seems that the issue comes from float64 to float32 conversion. I don't know why but the dtype of the observation space is now float64 and not float32. If you want to do a quick fix you can create an observation wrapper that converts the observation to float32. Anyway, I would recommend you to use the imitation learning repo which is based on stable-baselines and actively maintain.

Jan 05 '20 12:01 araffin

Hello @maryam-bandali I just fixed this issue just changing data type to float32 for reacher environment. Just go to environement .py file and probably in the observation code section just change it to float32.

def _get_obs(self): theta = self.sim.data.qpos.flat[:2] return np.concatenate( [ np.cos(theta), np.sin(theta), self.sim.data.qpos.flat[2:], self.sim.data.qvel.flat[:2], self.get_body_com("fingertip") - self.get_body_com("target"), ] ).astype('float32')

Oct 31 '21 10:10 mstkocyigit