blackdrops icon indicating copy to clipboard operation
blackdrops copied to clipboard

How to run BlackDROPS with GP-MI

Open urnotmeeto opened this issue 4 years ago • 4 comments

Have you implemented the BlackDROPS with GP-MI algorithm that was proposed in your ICRA 2018 paper in this repo? I am very interested in that idea and wondering how to replicate your experimental results.

urnotmeeto avatar May 27 '20 05:05 urnotmeeto

First of all, thank you for your interest.

Have you implemented the BlackDROPS with GP-MI algorithm that was proposed in your ICRA 2018 paper in this repo?

Yes and no. Yes because we have already implemented the GP-MI optimization procedure (see here), but no because we haven't included an example usage.

Let me create an example in the cartpole scenario (which is easy and fast to do), and I will ping you. Give me until 15th of June as I have a few urgent things to finish till then..

costashatz avatar May 27 '20 09:05 costashatz

That would be great! I'll check the optimization procedure before you release an example. Thank you very much!

urnotmeeto avatar May 28 '20 05:05 urnotmeeto

@urnotmeeto sorry for being late almost one month, but lots of things came up.

I have created a branch with an example of using GP-MI with the cartpole: gp_mi_example. Compile everything and then you can run the example with: ./deps/limbo/build/exp/blackdrops/src/classic_control/cartpole_mi_simu -m -1 -r 1 -n 10 -b 5 -e 1 -u -s. Replace simu with graphic to visualize what's going on. I am still debugging it for possible errors/mistakes (there is something fishy going on in the initial optimization), but it should be a good enough starting point for you and I do not want you to wait more.

The process starts by optimizing the mean model first (with an initial guess of the optimization variables of the mean --- different from the actual system), and the proceeds with the normal loop of optimizing the model and then the policy given the model. Beware that the model optimization will take much longer as well as the policy optimization (we are calling the mean function every-time we query the model).

costashatz avatar Jul 10 '20 10:07 costashatz

@costashatz Great! I'll check it. Thank you!

urnotmeeto avatar Jul 13 '20 02:07 urnotmeeto