ecole Environment Event

Describe the problem or improvement suggested

Currently, given for instance a reward function, the process of calling reset and obtain_reward is not transparent to the user. Is When is reset called? Before of after a the dynamics are themselves reset? When writing a new function, it also leads to confusions. Should reset return a reward or will obtain_reward be called right after? Furthermore, this current setting offers no possibility to access the Model outside of very specific places. Finally, the code for calling the callback is very similar between observation functions, reward functions, and soon info functions.

To alleviate this, we could generalize the idea of callback event, to reset_pre, reset_post, step_pre, step_post, seed_pre, seed_post. Ideally, such a callback would be created by simply defining a method a the same name.

struct AnyEvent: Event {
    void reset_pre(scip::Model  const&) override { /* Brand new model not yet on initial state */ }
};

An reward function would then become an event, with a __call__/operator() method for the reward.

struct LpIterations: RewardFunction, Event {
    void reset_pre(scip::Model const&) {  last_lp_iter = 0; }

    Reward operator()(scip::Model model const&) {
        auto reward = ... - last_lp_iter;
        return reward;
    }

private:
    double last_lp_iter = 0;
};

Describe the solution you would like

Base event class has all callback implemented with nothing being done.
Make event virtual to limit code duplication with Python? Only useful if we are able to call them from C++. In Dynamics?
override on scip::Model& and scip::Model const&

Describe alternatives you have considered

__call__/operator() becomes redundant with reset_post and step_post, so should it simply be a simple getter, that guarantee to return the same on two successive calls.
A more general scheme with arbitrary events that could be registered and fired at runtime (e.g. Scip events, or pytorch ignite events). However, this seem to get in the way of simple use case and it is not clear what use cases it would serve here.

Jul 06 '20 20:07 AntoinePrv

I like the idea, makes the code more explicit as to when reward / observation functions are called. That could apply as well to observation functions right ? Here are my comments:

couldn't RewardFunction already inherit from Event ? So that struct LpIterations: RewardFunction, Event { simplifies to struct LpIterations: RewardFunction {
not sure about the getter function, Reward operator()(scip::Model model const&) {. It is still not clear when this guy is called. Maybe the following: put all the logic for computing the reward in reset_pre(model), reset_post(model), step_pre(model), step_post(model), and then only have a Reward operator()() { getter, without argument, to recover the current reward anytime? The API is self-explaining, still short (5 methods), and very explicit.

Jul 07 '20 14:07 gasse

That could apply as well to observation functions right

Definitely the idea is also to remove code duplication.

I like your ideas, removing the argument from the getter is indeed much less confusing! Not sure what name to give it though. What feels more naural, reward = reward_function(), or reward = reward_function.get_reward()

Jul 07 '20 16:07 AntoinePrv

ecole ecole copied to clipboard

Environment Event

Describe the problem or improvement suggested

Describe the solution you would like

Describe alternatives you have considered

ecole
ecole copied to clipboard