tf2rl Implement AIRL

Learning Robust Rewards with Adversarial Inverse Reinforcement Learning

Jun 02 '19 17:06 keiohta

Test code

# Generate trajectories
$ python examples/run_sac.py --env-name HalfCheetah-v2 --save-test-path --test-interval 50000 --gpu -1
$ ls results
20191220T185529.974847_SAC_

$ python examples/run_airl_sac.py --env-name HalfCheetah-v2 --test-interval 10000 --gpu -1 --expert-path-dir results/20191220T185529.974847_SAC_

Dec 19 '19 09:12 keiohta

hi @keiohta when I run $ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943_SAC_ --gpu -1 --dir-suffix GAIfO

run_gaifo_ddpg.py: error: unrecognized arguments: --gpu -1

can you help me ? Thank you!

Jun 27 '20 10:06 haoyu-x

@haoyu-x Hi! Thanks for reporting the bug. I fixed the error on this commit, so can you try on the latest master branch again?

Jun 27 '20 11:06 keiohta

should I still use the same command suggested in issue 67? https://github.com/keiohta/tf2rl/issues/67

when I run python ~/tf2rl-master/examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943_SAC_ --gpu -1 --dir-suffix GAIL same error.

On Sat, Jun 27, 2020 at 7:52 PM Kei Ohta [email protected] wrote:

@haoyu-x https://github.com/haoyu-x Hi! Thanks for reporting the bug. I fixed the error on this commit https://github.com/keiohta/tf2rl/commit/ab675d0e8f7061910e8f44d00daf72c69c72db6a, so can you try on the latest master branch again?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650550289, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZW5GAOYIOYFBJIKW23RYXMOLANCNFSM4HSDDXZQ .

Jun 27 '20 12:06 haoyu-x

Yeah, did you update the codes?

Jun 27 '20 13:06 keiohta

yes. I updated. Can you run gail and gaifo on your computer?

On Sat, Jun 27, 2020 at 9:13 PM Kei Ohta [email protected] wrote:

Yeah, did you update the codes?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650559655, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZUBM7ZIBNGBRSREIM3RYXV6FANCNFSM4HSDDXZQ .

Jun 27 '20 13:06 haoyu-x

At least I resolved the error of --gpu. Let me check whether full code runs.

Jun 27 '20 13:06 keiohta

Is there any other method to run gail and gaifo instead of the command line?

On Sat, Jun 27, 2020 at 9:15 PM Kei Ohta [email protected] wrote:

At least I resolved the error of --gpu. Let me check whether full code runs.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650559894, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZSL5LEL2TMTZBCXKITRYXWGTANCNFSM4HSDDXZQ .

Jun 27 '20 13:06 haoyu-x

I confirmed the script runs on my machine. Can you provide me with the full error message?

$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=50000 --max-steps 300000
$ ls results
20200627T221712.423081_SAC_
$ find results/20200627T221712.423081_SAC_/ -name *.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_02_return_02744.1677.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_04_return_02701.9388.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_00_return_03121.5797.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_01_return_02784.6256.pkl
results/20200627T221712.423081_SAC_/step_00050000_epi_03_return_02752.4279.pkl

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20200627T221712.423081_SAC_/ --gpu -1
...
22:23:48.107 [INFO] (irl_trainer.py:74) Total Epi:    19 Steps:   19000 Episode Steps:  1000 Return:  1174.4017 FPS: 118.79
22:23:56.162 [INFO] (irl_trainer.py:74) Total Epi:    20 Steps:   20000 Episode Steps:  1000 Return:  1889.9691 FPS: 124.15
22:23:57.861 [INFO] (irl_trainer.py:118) Evaluation Total Steps:   20000 Average Reward  2278.0820 over  5 episodes

Jun 27 '20 13:06 keiohta

[image: Screenshot from 2020-06-27 21-38-20.png]

On Sat, Jun 27, 2020 at 9:34 PM Kei Ohta [email protected] wrote:

I confirmed the script runs on my machine. Can you provide me with the full error message?

$ python examples/run_sac.py --env-name=HalfCheetah-v2 --save-test-path --test-interval=50000 --max-steps 300000 $ ls results 20200627T221712.423081_SAC_ $ find results/20200627T221712.423081_SAC_/ -name *.pkl results/20200627T221712.423081_SAC_/step_00050000_epi_02_return_02744.1677.pkl results/20200627T221712.423081_SAC_/step_00050000_epi_04_return_02701.9388.pkl results/20200627T221712.423081_SAC_/step_00050000_epi_00_return_03121.5797.pkl results/20200627T221712.423081_SAC_/step_00050000_epi_01_return_02784.6256.pkl results/20200627T221712.423081_SAC_/step_00050000_epi_03_return_02752.4279.pkl

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20200627T221712.423081_SAC_/ --gpu -1 ... 22:23:48.107 [INFO] (irl_trainer.py:74) Total Epi: 19 Steps: 19000 Episode Steps: 1000 Return: 1174.4017 FPS: 118.79 22:23:56.162 [INFO] (irl_trainer.py:74) Total Epi: 20 Steps: 20000 Episode Steps: 1000 Return: 1889.9691 FPS: 124.15 22:23:57.861 [INFO] (irl_trainer.py:118) Evaluation Total Steps: 20000 Average Reward 2278.0820 over 5 episodes

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650562023, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZQZU6PRI4UA6GBI2TLRYXYPZANCNFSM4HSDDXZQ .

Jun 27 '20 13:06 haoyu-x

Oh, I assumed you installed tf2rl on developer mode... I have not reflected my change on PyPI, so I do now.

Jun 27 '20 13:06 keiohta

sure. Please let me know what I should do after your change, Thank you a lot!

On Sat, Jun 27, 2020 at 9:39 PM Kei Ohta [email protected] wrote:

Oh, I assumed you installed tf2rl on developer mode... I have not reflected my change on PyPI, so I do now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650562564, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZUVWIQGBHFOU5EK4ETRYXZBZANCNFSM4HSDDXZQ .

Jun 27 '20 13:06 haoyu-x

Now, you can get the latest codes through PyPI. Can you try following?

# Update tf2rl
$ pip install -U tf2rl
# Make sure the version is 0.1.14
$ pip list | grep tf2rl

# Run your script
$ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943_SAC_ --gpu -1 --dir-suffix GAIfO

By the way, it seems that your path: ~/tf2rl-master suggests that you did not install tf2rl using git clone but you just download zip file, didn't you? Anyway above command can detect the version, so please let me know if you still encounter the same problem.

Jun 27 '20 13:06 keiohta

problem fixed. But encountering another issue. :(

On Sat, Jun 27, 2020 at 9:48 PM Kei Ohta [email protected] wrote:

Now, you can get the latest codes through PyPI. Can you try following?

Update tf2rl

$ pip install -U tf2rl

Make sure the version is 0.1.14

$ pip list | grep tf2rl

Run your script

$ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943_SAC_ --gpu -1 --dir-suffix GAIfO

By the way, it seems that your path: ~/tf2rl-master suggests that you did not install tf2rl using git clone but you just download zip file, didn't you? Anyway above command can detect the version, so please let me know if you still encounter the same problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650563507, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZTP44TSSKYFXIEKWF3RYX2EDANCNFSM4HSDDXZQ .

Jun 27 '20 13:06 haoyu-x

[image: Screenshot from 2020-06-27 21-54-05.png]

On Sat, Jun 27, 2020 at 9:53 PM Haoyu Xiong [email protected] wrote:

problem fixed. But encountering another issue. :(

On Sat, Jun 27, 2020 at 9:48 PM Kei Ohta [email protected] wrote:

Now, you can get the latest codes through PyPI. Can you try following?

Update tf2rl

$ pip install -U tf2rl

Make sure the version is 0.1.14

$ pip list | grep tf2rl

Run your script

$ python ~/tf2rl-master/examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir ~/GAIL/results/20200619T013740.036943_SAC_ --gpu -1 --dir-suffix GAIfO

By the way, it seems that your path: ~/tf2rl-master suggests that you did not install tf2rl using git clone but you just download zip file, didn't you? Anyway above command can detect the version, so please let me know if you still encounter the same problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650563507, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZTP44TSSKYFXIEKWF3RYX2EDANCNFSM4HSDDXZQ .

Jun 27 '20 13:06 haoyu-x

I cannot see your screenshot. Can you copy the message or retry uploading the picture?

Jun 27 '20 14:06 keiohta

sure.

21:56:03.468 [INFO] (irl_trainer.py:74) Total Epi: 7 Steps: 7000 Episode Steps: 1000 Return: -327.7823 FPS: 4416.74 21:56:03.713 [INFO] (irl_trainer.py:74) Total Epi: 8 Steps: 8000 Episode Steps: 1000 Return: -262.8208 FPS: 4088.41 21:56:03.955 [INFO] (irl_trainer.py:74) Total Epi: 9 Steps: 9000 Episode Steps: 1000 Return: -325.9061 FPS: 4149.77 21:56:04.268 [INFO] (irl_trainer.py:74) Total Epi: 10 Steps: 10000 Episode Steps: 1000 Return: -278.5830 FPS: 4176.82 Traceback (most recent call last): File "/home/haoyux/tf2rl-master/examples/run_gaifo_ddpg.py", line 43, in trainer() File "/home/haoyux/venv/lib/python3.6/site-packages/tf2rl/experiments/irl_trainer.py", line 113, in call expert_next_states=self._expert_next_obs[indices]) File "/home/haoyux/venv/lib/python3.6/site-packages/tf2rl/algos/gaifo.py", line 48, in train agent_states, agent_next_states, expert_states, expert_next_states) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in call result = self._call(*args, **kwds) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 627, in _call self._initialize(args, kwds, add_initializers_to=initializers) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 506, in _initialize *args, **kwds)) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected graph_function, _, _ = self._maybe_define_function(args, kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function graph_function = self._create_graph_function(args, kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2667, in _create_graph_function capture_by_value=self._capture_by_value), File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func func_outputs = python_func(*func_args, **func_kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn return weak_wrapped_fn().wrapped(*args, **kwds) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 3299, in bound_method_wrapper return wrapped_fn(*args, **kwargs) File "/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper raise e.ag_error_metadata.to_exception(e) ValueError: in user code:

/home/haoyux/venv/lib/python3.6/site-packages/tf2rl/algos/gaifo.py:58

_train_body * real_logits = self.disc([expert_states, expert_next_states]) /home/haoyux/venv/lib/python3.6/site-packages/tf2rl/algos/gail.py:29 call * features = self.l1(features)

/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:886 call ** self.name)

/home/haoyux/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/input_spec.py:216 assert_input_compatibility ' but received input with shape ' + str(shape))

ValueError: Input 0 of layer L1 is incompatible with the layer:

expected axis -1 of input shape to have value 34 but received input with shape [32, 6]

(venv) haoyux@haoyux-ThinkPad:~$

[image: Screenshot from 2020-06-27 21-54-05.png]

On Sat, Jun 27, 2020 at 10:00 PM Kei Ohta [email protected] wrote:

I cannot see your screenshot. Can you copy the message or retry uploading the picture?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650564749, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZVMZ7AIN7BFRXJPDCLRYX3PJANCNFSM4HSDDXZQ .

Jun 27 '20 14:06 haoyu-x

I guess you collected the expert transitions on different environment (such as Pendulum-v0? because the state dimension of pendulum-v0 is 3). Are you sure the expert data are collected on HalfCheetah-v2?

Jun 27 '20 14:06 keiohta

OH! I made a stupid mistask. Thank you Kei, everything is fine now!

On Sat, Jun 27, 2020 at 10:24 PM Kei Ohta [email protected] wrote:

I guess you collected the expert transitions on different environment (such as Pendulum-v0? because the state dimension of pendulum-v0 is 3). Are you sure the expert data are collected on HalfCheetah-v2?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650567358, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZRGHFMBPALRGXXZE6DRYX6IFANCNFSM4HSDDXZQ .

Jun 27 '20 14:06 haoyu-x

one last question, how can I make a tensorboard figure like yours by command line? [image: Screenshot from 2020-06-27 22-28-13.png]

On Sat, Jun 27, 2020 at 10:26 PM Haoyu Xiong [email protected] wrote:

OH! I made a stupid mistask. Thank you Kei, everything is fine now!

On Sat, Jun 27, 2020 at 10:24 PM Kei Ohta [email protected] wrote:

I guess you collected the expert transitions on different environment (such as Pendulum-v0? because the state dimension of pendulum-v0 is 3). Are you sure the expert data are collected on HalfCheetah-v2?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650567358, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZRGHFMBPALRGXXZE6DRYX6IFANCNFSM4HSDDXZQ .

Jun 27 '20 14:06 haoyu-x

It's great your script runs successfully! I cannot see your picture again... I just do:

$ tensorboard --logdir results

Does this answer your question?

Jun 27 '20 14:06 keiohta

I mean how can I visualize the training process using tensorboard. The figure is https://github.com/keiohta/tf2rl/issues/67

On Sat, Jun 27, 2020 at 10:38 PM Kei Ohta [email protected] wrote:

It's great your script runs successfully! I cannot see your picture again... I just do:

$ tensorboard --logdir results

Does this answer your question?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650568998, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZSWMA2M76SP3B2JWE3RYX74RANCNFSM4HSDDXZQ .

Jun 27 '20 14:06 haoyu-x

You can add suffix to a resulted directory by adding --dir-suffix option. #67 uses it as:

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix GAIL
$ python examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix GAIfO
$ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix VAIL

Jun 27 '20 14:06 keiohta

yes! thank you！

On Sat, Jun 27, 2020 at 10:47 PM Kei Ohta [email protected] wrote:

You can add suffix to a resulted directory by adding --dir-suffix option. #67 https://github.com/keiohta/tf2rl/issues/67 uses it as:

$ python examples/run_gail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix GAIL $ python examples/run_gaifo_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix GAIfO $ python examples/run_vail_ddpg.py --env-name=HalfCheetah-v2 --expert-path-dir results/20191213T203858.508559_SAC_ --gpu -1 --dir-suffix VAIL

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650570163, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZXH2RFEPZ4ZPSAES23RYYBADANCNFSM4HSDDXZQ .

Jun 27 '20 14:06 haoyu-x

My pleasure! Please don't hesitate to open an issue if you encounter any difficulty or question. I close this issue. Thanks for the report!

Jun 27 '20 14:06 keiohta

OMG, this issue is not related to your question. So, I have to reopen this one. It would be better to open a new issue if it is not related to the original one ;)

Jun 27 '20 15:06 keiohta

thank you again!

On Sat, Jun 27, 2020 at 11:01 PM Kei Ohta [email protected] wrote:

OMG, this issue is not related to your question. So, I have to reopen this one. It would be better to open a new issue if it is not related to the original one ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650571897, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZWHX7KRFYN2KM4B2CDRYYCTNANCNFSM4HSDDXZQ .

Jun 27 '20 15:06 haoyu-x

Hi Kei,

I'm using tf2rl'gaifo on robosuite. https://github.com/gal-leibovich/robosuite. but there is an error: mujoco_py.builder.MujocoException: Unknown warning type Time = 1.3900.Check for NaN in simulation. I found out that my policy-net generates action [nan nan nan nan nan nan nan nan] after several episodes training. It happens on robosuite all the time, but works well on gym. I'm wondering if you can offer me some help. Thank you!

On Sat, Jun 27, 2020 at 11:04 PM Haoyu Xiong [email protected] wrote:

thank you again!

On Sat, Jun 27, 2020 at 11:01 PM Kei Ohta [email protected] wrote:

OMG, this issue is not related to your question. So, I have to reopen this one. It would be better to open a new issue if it is not related to the original one ;)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/keiohta/tf2rl/issues/36#issuecomment-650571897, or unsubscribe https://github.com/notifications/unsubscribe-auth/APACPZWHX7KRFYN2KM4B2CDRYYCTNANCNFSM4HSDDXZQ .

Jul 05 '20 20:07 haoyu-x

Hi, @haoyu-x

Could you open a new issue?

This is the issue where developpers track and discuss AIRL implementation.

For me, your problem is not related with the main topic of this issue.

Jul 05 '20 21:07 ymd-h

Thanks @yamada-github-account , @haoyu-x and yes, I also think it would be better to open a new issue regarding this.

Jul 06 '20 10:07 keiohta

tf2rl tf2rl copied to clipboard

Implement AIRL

Update tf2rl

Make sure the version is 0.1.14

Run your script

Update tf2rl

Make sure the version is 0.1.14

Run your script

tf2rl
tf2rl copied to clipboard