Idea for performance & regression tests
I had an idea for how we could make better performance & regression tests.
We could have a secondary testing system for tests which take a long time to run. We have a number of examples with large data sets which we can turn into tests with relative ease. These tests use real world data in plausible user scenarios.
- MNSIT - approx 5 minutes to run
- Hotgym - approx 5 minutes to run
- NAB - several hours to run
- Nupic.py has an example for "MSNBC.com Anonymous Web Data" which tracks users as they brows MSNBC.com and predicts which news article they will read next. I have not tried this example.
- Hopefully more examples?
Design notes:
- Our optimization framework has facilities to run examples and measure their performance. All of our examples already use this framework.
- Keep a persistent file of results, saved on and local to the computer which runs the tests.
- These tests can never run in CI because they take too long, so this is possible.
- Using the same computer hardware for each test means you can compare the run times between versions of the library. Keep the run-times on file, so that you can check for performance regressions after upgrading nupic.
- This would be a long term project.
There are some other datasets I think very interesting for HTM like Neuromorphic MNIST (sparse images) Human activity (both dense and sparse)
I like that idea.
The unit tests currently take too long but to shorten the performance test would compromise the quality of the test. By making them a separate test the performance tests could be made longer/larger to get a more accurate reading. This performance test should be ran at least once on PRs that could impact performance but would not need to be ran on every CI.
I'm not sure, the ideas IS reasonable, in numenta it was a pain keeping the tests in sync and figuring retroactively what broke..
This performance test should be ran at least once on PRs that could impact performance but would not need to be ran on every CI.
this would be near impossible to be automated, but ok for us to handle by acknowledging "suspicious" PRs
I'd say running the battery of performance tests say every day.
we have now workflow arm.yml which is esentially a long-running task being scheduled nightly.