Stronger CI for maintaining model consistency
The unit tests largely ensure that the modules run and create reasonable results. They are sensitive to silent failures in model performance. For example, I could write a poor pull request that only includes a portion of the predictions and the current tests wouldn't be able to spot that. We need both visual cues that can be visualized for changes as well as better understanding on expected results given a particular image.
I would like to push forward on this issue. I am imagining a set of tools that allow us to keep track of precision and recall against a benchmark set. We will need something small that can be downloaded, relates to https://github.com/weecology/DeepForest/issues/474. The use case here is to prevent pull requests from hurting model perfomance. It would be just an early warning system.
@ethanwhite had a suggestion on how to move forward.
Based on recent discussions we are viewing model training and releasing as handled separately from this repository. Given that I'm wondering if this issue should be closed or moved elsewhere.
@henrykironde we should have a tag, "Discuss in our next meeting". This one and #740. Or further consultation required.