ramp-workflow icon indicating copy to clipboard operation
ramp-workflow copied to clipboard

Disable bagging if asked

Open albertcthomas opened this issue 3 years ago • 7 comments

Sometimes, the bagging CV scores can be slow and/or lead to a MemoryError. This of course depends on the dataset, the problem and the machine I use.

This PR adds a no-bagging flag to the ramp-test command to avoid computing the bagging CV scores. @agramfort is also interested in this for an upcoming challenge I believe.

albertcthomas avatar Apr 10 '21 10:04 albertcthomas

Codecov Report

Merging #263 (a2a359f) into master (e97a272) will increase coverage by 0.01%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #263      +/-   ##
==========================================
+ Coverage   82.13%   82.15%   +0.01%     
==========================================
  Files         133      133              
  Lines        4938     4942       +4     
==========================================
+ Hits         4056     4060       +4     
  Misses        882      882              
Impacted Files Coverage Δ
rampwf/utils/cli/testing.py 62.50% <100.00%> (+0.96%) :arrow_up:
rampwf/utils/testing.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update e97a272...a2a359f. Read the comment docs.

codecov[bot] avatar Apr 10 '21 10:04 codecov[bot]

@maikia can you check if it helps with our problems? thx

thx a lot @albertcthomas for the patch.

agramfort avatar Apr 10 '21 12:04 agramfort

The patch is fine, thx @albertcthomas. The problem I see for ramp-board is that the event will need to know about the flag for 1) setting --no-bagging when ramp-test is called on the server and 2) taking care of using the mean score on the leaderboards. I see two solutions: 1) adding a bagging field to the Event table in the DB which needs to be set on the update_event form or 2) putting the bagging field in problem.py. 1 is somewhat cleaner, but it requires migration. 2 would make sense because i) the attribute belongs to the problem, not to the event, from a DB point of view, and problem.py is in fact functioning as the Problem table in the DB, ii) it would not require modifying the ramp-test script in the worker. But 2 would also mean that we should re-think what we do in this PR: do we have both a CI option and the field in problem.py, or only the latter.

In both cases, we'll need to code taking a score from the mean instead of the csv containing the bagged scores

kegl avatar Apr 11 '21 10:04 kegl

I would also like to urge ourselves to merge advanced into master so we can get back maintaining only one branch.

kegl avatar Apr 11 '21 10:04 kegl

Great, thanks @albertcthomas for taking care of this. I will try to find some time this week to test it on our problem and will give you feedback

maikia avatar Apr 11 '21 19:04 maikia

@albertcthomas your solution works well for us when using ramp-test. I have no further comments on that part.

As for how to use it by the ramp-board, I would opt for the 2nd option of @kegl It seems more straight forward from the implementation and from the user perspective

maikia avatar Apr 18 '21 19:04 maikia

OK @maikia @albertcthomas in this case we should add this to this PR: having a

no-bagging = True

in problem.py should have the same effect as the no-bagging option of ramp-test.

Then the only thing to implement in ramp-board is to catch when there are no bagged results, and use the mean instead in the leaderboard.

kegl avatar Apr 18 '21 19:04 kegl