ppo_cpp
ppo_cpp copied to clipboard
Adjustments needed to make for running my task
Hi Antymon, Thanks very much for the repo so that people can export the trained agent from python to c++.
I would like to confirm the changes I have to make in order to run my own task:
-
Make my environment inherit from Env abstract class under env\env.hpp
-
Modify the main ppo2.cpp which creates instance of an environment and passes it to PPO
*I have two questions: a. if I would like to run inference only, should I use algorithm.eval(obs) directly? As it seems it uses get_deterministic_action() and I noticed there are also step(const tensorflow::Tensor& obs) and value(const tensorflow::Tensor& obs) which I can't tell exactlly what the differences are between them.
b. It seems a way to resume training using your implementation so that online learning can be achieved?
- Create own computational graph and potentially make some small modifications to the core algorithm if using more involved policies (currently implementation supports only MLP policies). Graph generation is mentioned below.
- I used 'tf.train.export_meta_graph(graph=model.graph, filename='my-model.meta', clear_devices=True, clear_extraneous_savers=True, strip_default_attrs=True)' as it mentioned here but it has many redundant tensors in the generated model when I visualize it. Some say its because some stuff used for training are preserved. I was wondering have you encountered such situation and how did you solve it?
Thank you again!
Hi,
Before I start, this code was largely ported from Python under the desire of minimal effort, so don't assume certain parts were well-thought-through and feel free to change anything you like.
1,2) Yeah that should be it.
a1) Eval()
should be fine to get action, but perhaps look into playback()
of ppo.cpp
for inspiration on how I used it in conjunction with stepping through the environment. As far as I remember, playback()
is called when you ask for inference from CLI (as I think I described in the readme.MD
, perhaps worth reading that part).
a2) Generally, PPO outputs stochastic actions, but I chose to use only means during inference. You can change it if you like, not sure what are your needs. I am sure there is a stochastic counterpart in the graph at least.
a3) As for value and step look at the fetches. Step gets you action and stuff, and value gets you value. But you are right in that they are confusing. Again, mindless porting.
b) If you use the latest StableBaselines you are likely to get in trouble, as I never updated my stuff. What I would recommend is to use a modified version which I host and refer to in Singularity files. I think there is also explicit repository listing in the readme.md
.
Hope this helps, Good luck!