ramp-workflow
ramp-workflow copied to clipboard
Disable bagging if asked
Sometimes, the bagging CV scores can be slow and/or lead to a MemoryError. This of course depends on the dataset, the problem and the machine I use.
This PR adds a no-bagging
flag to the ramp-test
command to avoid computing the bagging CV scores.
@agramfort is also interested in this for an upcoming challenge I believe.
Codecov Report
Merging #263 (a2a359f) into master (e97a272) will increase coverage by
0.01%
. The diff coverage is100.00%
.
@@ Coverage Diff @@
## master #263 +/- ##
==========================================
+ Coverage 82.13% 82.15% +0.01%
==========================================
Files 133 133
Lines 4938 4942 +4
==========================================
+ Hits 4056 4060 +4
Misses 882 882
Impacted Files | Coverage Δ | |
---|---|---|
rampwf/utils/cli/testing.py | 62.50% <100.00%> (+0.96%) |
:arrow_up: |
rampwf/utils/testing.py | 100.00% <100.00%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update e97a272...a2a359f. Read the comment docs.
@maikia can you check if it helps with our problems? thx
thx a lot @albertcthomas for the patch.
The patch is fine, thx @albertcthomas. The problem I see for ramp-board is that the event will need to know about the flag for 1) setting --no-bagging when ramp-test is called on the server and 2) taking care of using the mean score on the leaderboards. I see two solutions: 1) adding a bagging field to the Event table in the DB which needs to be set on the update_event form or 2) putting the bagging field in problem.py. 1 is somewhat cleaner, but it requires migration. 2 would make sense because i) the attribute belongs to the problem, not to the event, from a DB point of view, and problem.py is in fact functioning as the Problem table in the DB, ii) it would not require modifying the ramp-test script in the worker. But 2 would also mean that we should re-think what we do in this PR: do we have both a CI option and the field in problem.py, or only the latter.
In both cases, we'll need to code taking a score from the mean instead of the csv containing the bagged scores
I would also like to urge ourselves to merge advanced into master so we can get back maintaining only one branch.
Great, thanks @albertcthomas for taking care of this. I will try to find some time this week to test it on our problem and will give you feedback
@albertcthomas your solution works well for us when using ramp-test
. I have no further comments on that part.
As for how to use it by the ramp-board, I would opt for the 2nd option of @kegl It seems more straight forward from the implementation and from the user perspective
OK @maikia @albertcthomas in this case we should add this to this PR: having a
no-bagging = True
in problem.py should have the same effect as the no-bagging
option of ramp-test
.
Then the only thing to implement in ramp-board is to catch when there are no bagged results, and use the mean instead in the leaderboard.