disco
disco copied to clipboard
add end2end testing
document for us internally the key metrics for some of the standard tasks (e.g. titanic), and test those by hand before/after changes which could affect it. can be done by human, does not need to be automated. this is important in addition to the unit tests.
- test accuracy for training alone
- test accuracy for 2 clients, with the same dataset strictly partitioned between the two
no code needed for now but clear values needed and file splits and documentation of expected results in a readme etc (or on the task)
ideally we'd also have these 2 models tested on a completely separate holdout set, removed from the training set before given to disco.
will make a new issue to support validation on separate set