spacepy
spacepy copied to clipboard
Find uncaught warnings in CI
The CI really should fail on uncaught warnings. I'm not sure exactly the best way to do this; -Werror
seems to raise the exception immediately so catch_warnings
is ineffective and there are lots of false positives (although that's from memory.) Maybe setting a really clever warnings filter? Or we can grep the output of the tests and look for uncaught warnings. It would just be nice to have something other than "@balarsen upgraded numpy and opened an issue" to find these things.
Obviously #455, #425, #461, and any other pending warnings need to be cleaned up before this happens.
Couple of possible approaches that come to mind:
We could wrap the call to unittest.main
in test_all.py
with a catch_warnings
to catch uncaught warnings and then fail. But I don't know if unittest.main
returns. We could also get fancy and extend unittest.main
or subclass TestCase
to see if there are uncaught warnings...this could even extend to having a different status reported (e.g. W instead of F for fail, E for error).
So I'm torn on having the CI actually fail on uncaught warnings. The CI indicates to the wider world whether we're passing tests, but a reasonable user could easily infer, from seeing that banner saying CI failed, that there are unresolved (known) bugs affecting functionality. Uncaught warnings from upstream don't necessarily rise to that level.
I do see the need to trap them automatically somehow so we get notified, and obviously having the tests fail is an easy way to do that.
My ideal solution - and I don't know if this is possible - would be to have our CI open an issue (or comment on a PR, whatever) when there are uncaught warnings and assign to devs.
It looks to me like GH Actions can be used to open an issue (https://github.com/marketplace/actions/open-github-issue). Not sure about how we'd set that workflow up, but it avoids the appearance of SpacePy being "broken" when it's an upstream DeprecationWarning
that doesn't immediately impact functionality or behaviour.
Subclassing TestCase
to report a different status could help with a workflow like that.
I think it would be possible for each test job to parse the output and look for warnings, then pass the information out to the "all tests" job which could take action. We wouldn't want the test job itself to take the action since that would result in opening an issue for every Python version, dependency version, etc. There's a concept of "output" from the job that other jobs can look at and we could use that.
I'm thinking the right thing would be for a test to fail if it's a PR build, and pass but open an issue if it's a cron job build. The trick would be to not open an issue every single time the cron job runs. I don't know if we have access to, say, the status of the past cron run or something.
I'm not a big fan of doing things just for the optics (e.g. we've got a lot of open issues and I don't want to close just to close) but in practice if a build failure sends an email and we then open an issue "by hand," it probably makes more sense for the computer to do that work for us instead.
From the upshot of #425, whatever course we take, we shouldn't do this on old versions of dependencies. Every Python version has an "oldest" and a "newest" dependency strategy in the CI...I suggest we ignore warnings on "oldest" and only care about it on "newest."