Rethinking our test suite
This issue will keep track of a potentially big refactor of our tests.
Here are some thoughts, discussed over a few dev calls:
- Building for all architectures for a given language has not proven to be useful, but has proven to be slow! We should probably run our tests for a single architecture and a single language version (for Python).
- Having tests that hit live endpoints on the Signet network is just not a good idea. We should probably have a beefier Regtest setup and run the tests there.
- A lot of our current tests are better thought of as examples. We should consider compiling those but not actually running them. This would catch API breaks and whatnot without having the flakiness of the live tests, while also leaving full live examples devs can run locally if they need to, or use as starting points for building on the API.
A more concrete todo list of tasks for this would look something like this:
- [ ] Remove all tests that hit live signet endpoints
- [ ] Make use of a persisted wallet that has some balance to test some of the transaction building APIs
- [ ] Build + test only for some of the architectures we support in each language
- [ ] Build (but not run) some examples pulled from the live tests (requires adapting our actions)
- [ ] Add an action that can build and run those examples as a manual trigger using
workflow_dispatch - [ ] Rethink/Remove print statements (see the closed #759)
Further:
- [ ] Look into adding a Regtest setup to bring back some "live" tests but on Regtest
Wondering what the deciding factor for which architecture to run on will be. Maybe most common ?
- A lot of our current tests are better thought of as examples. We should consider compiling those but not actually running them. This would catch API breaks and whatnot without having the flakiness of the live tests, while also leaving full live examples devs can run locally if they need to, or use as starting points for building on the API.
I think we can do this by adding @ignore (at least for kotlin) annotations to the current live tests other languages should have their own version of this. This will allow the tests to be built but not run. This will be helpful and less disruptive especially if we are planning to bring the tests back after we have a robust regtest setup. Next, we make code build for one or a few a architectures.
I could do this in a series of PRs if we agree with my comment.
CC: @rustaceanrob @reez
Edit
I just noticed live-tests.yaml. That will have to stop what its currently doing. Or make it run only when we specify.
Sounds good to me as a first step!
I also think in general that the "live tests" don't all need to be ported/transformed into examples (too redundant), and so it makes more sense to rather start fresh on the examples side. We also don't need to duplicate the work/examples that are in the bookofbdk (examples here), so it's a tricky balance.
One small issue I have with the approach however is that it would be good to at least keep the option to run some live tests, so we can at least run them locally or trigger them on command in GitHub, just for sanity checks. With the @ignore, we might not be able to run them at all? Let me know if there is a way to do this cleanly.
Sounds good to me as a first step!
I also think in general that the "live tests" don't all need to be ported/transformed into examples (too redundant), and so it makes more sense to rather start fresh on the examples side. We also don't need to duplicate the work/examples that are in the bookofbdk (examples here), so it's a tricky balance.
Alright, sounds good. Each language folder should have their own example folder then rather than global example folder with language subfolders. Will help for pipeline actions.
One small issue I have with the approach however is that it would be good to at least keep the option to run some live tests, so we can at least run them locally or trigger them on command in GitHub, just for sanity checks. With the
@ignore, we might not be able to run them at all? Let me know if there is a way to do this cleanly.
I agree. Giving the context @ignore is not the best. Looking more closely at the pipeline, live-tests only runs once a week and by manual execution. Live tests have been excluded from the test commands in tests workflow files for the various languages. At the moment we really only run live test once a week (of which we can decide if that too frequent or we do not want it to run at all). Which makes me wonder why we don't just leave the tests as is. Sorry if I am making this longer than it should be, just want to be sure I am not missing the plot.
Wondering what the deciding factor for which architecture to run on will be. Maybe most common ?
On this we have 4 different platforms we build python on. Each of those 4 build 5 different python versions. Making 20 builds in total. Thankfully, @rustaceanrob has done a PR to remove 2 versions which should bring the number down to 12. I am thinking maybe we could run 1 distinct version on one architecture instead of all on all architecture.