ecole
                                
                                 ecole copied to clipboard
                                
                                    ecole copied to clipboard
                            
                            
                            
                        Environment Event
Describe the problem or improvement suggested
Currently, given for instance a reward function, the process of calling reset and obtain_reward is not transparent to the user. Is When is reset called? Before of after a the dynamics are themselves reset?
When writing a new function, it also leads to confusions. Should reset return a reward or will obtain_reward be called right after?
Furthermore, this current setting offers no possibility to access the Model outside of very specific places.
Finally, the code for calling the callback is very similar between observation functions, reward functions, and soon info functions.
To alleviate this, we could generalize the idea of callback event, to reset_pre, reset_post, step_pre, step_post, seed_pre, seed_post.
Ideally, such a callback would be created by simply defining a method a the same name.
struct AnyEvent: Event {
    void reset_pre(scip::Model  const&) override { /* Brand new model not yet on initial state */ }
};
An reward function would then become an event, with a __call__/operator() method for the reward.
struct LpIterations: RewardFunction, Event {
    void reset_pre(scip::Model const&) {  last_lp_iter = 0; }
    Reward operator()(scip::Model model const&) {
        auto reward = ... - last_lp_iter;
        return reward;
    }
private:
    double last_lp_iter = 0;
};
Describe the solution you would like
- Base event class has all callback implemented with nothing being done.
- Make event virtual to limit code duplication with Python? Only useful if we are able to call them from C++. In Dynamics?
- overrideon- scip::Model&and- scip::Model const&
Describe alternatives you have considered
- __call__/- operator()becomes redundant with- reset_postand- step_post, so should it simply be a simple getter, that guarantee to return the same on two successive calls.
- A more general scheme with arbitrary events that could be registered and fired at runtime (e.g. Scip events, or pytorch ignite events). However, this seem to get in the way of simple use case and it is not clear what use cases it would serve here.
I like the idea, makes the code more explicit as to when reward / observation functions are called. That could apply as well to observation functions right ? Here are my comments:
- couldn't RewardFunctionalready inherit fromEvent? So thatstruct LpIterations: RewardFunction, Event {simplifies tostruct LpIterations: RewardFunction {
- not sure about the getter function, Reward operator()(scip::Model model const&) {. It is still not clear when this guy is called. Maybe the following: put all the logic for computing the reward inreset_pre(model),reset_post(model),step_pre(model),step_post(model), and then only have aReward operator()() {getter, without argument, to recover the current reward anytime? The API is self-explaining, still short (5 methods), and very explicit.
That could apply as well to observation functions right
Definitely the idea is also to remove code duplication.
I like your ideas, removing the argument from the getter is indeed much less confusing! Not sure what name to give it though.
What feels more naural, reward = reward_function(), or reward = reward_function.get_reward()