vowpal_wabbit [wip] experimenting with tests for bad squarecb predictions issue

[wip] experimenting with tests for bad squarecb predictions issue

Open ataymano opened this issue 3 years ago • 2 comments

Feb 01 '22 06:02 ataymano

Just wanted to ask your opinion on doing things like that. The goal is to write tests that are asserting specific invariants vs checking stdout that we have now - looks like we really need to have some tool for that. For example here we can check if every of many different exploration strategies (adding more is almost config change) produces valid probabilities. Cons here:

Dependency on vw_executor is introduced which is probably not great. We can replace it with just pyvw, but the code will become a bit more verbose.
It would be better to replace checked in datafiles like the one i have here with the code that generates them - happy to do it if we decide to proceed.

In general it is just a quick draft illustrating approach - happy to hear opinions whether we should continue moving in this direction and then polish details. Also happy to hear about other ways of writing similar tests (in case I miss something)

Feb 03 '22 17:02 ataymano

Just wanted to ask your opinion on doing things like that. The goal is to write tests that are asserting specific invariants vs checking stdout that we have now - looks like we really need to have some tool for that. For example here we can check if every of many different exploration strategies (adding more is almost config change) produces valid probabilities. Cons here:
1. Dependency on vw_executor is introduced which is probably not great. We can replace it with just pyvw, but the code will become a bit more verbose.

2. It would be better to replace checked in datafiles like the one i have here with the code that generates them - happy to do it if we decide to proceed.
In general it is just a quick draft illustrating approach - happy to hear opinions whether we should continue moving in this direction and then polish details. Also happy to hear about other ways of writing similar tests (in case I miss something)

There's some benefit in implementing some of these as a test-debug reduction and that way that could be turned on to debug at any point. The bad thing is that they would have to be implemented in cpp.

Feb 08 '22 00:02 lalo

continuing working on idea in the fork

Nov 17 '22 19:11 ataymano

vowpal_wabbit vowpal_wabbit copied to clipboard

[wip] experimenting with tests for bad squarecb predictions issue

vowpal_wabbit
vowpal_wabbit copied to clipboard