spacepy Find uncaught warnings in CI

The CI really should fail on uncaught warnings. I'm not sure exactly the best way to do this; -Werror seems to raise the exception immediately so catch_warnings is ineffective and there are lots of false positives (although that's from memory.) Maybe setting a really clever warnings filter? Or we can grep the output of the tests and look for uncaught warnings. It would just be nice to have something other than "@balarsen upgraded numpy and opened an issue" to find these things.

Obviously #455, #425, #461, and any other pending warnings need to be cleaned up before this happens.

Jan 27 '21 12:01 jtniehof

Couple of possible approaches that come to mind:

We could wrap the call to unittest.main in test_all.py with a catch_warnings to catch uncaught warnings and then fail. But I don't know if unittest.main returns. We could also get fancy and extend unittest.main or subclass TestCase to see if there are uncaught warnings...this could even extend to having a different status reported (e.g. W instead of F for fail, E for error).

Jan 31 '21 20:01 jtniehof

So I'm torn on having the CI actually fail on uncaught warnings. The CI indicates to the wider world whether we're passing tests, but a reasonable user could easily infer, from seeing that banner saying CI failed, that there are unresolved (known) bugs affecting functionality. Uncaught warnings from upstream don't necessarily rise to that level.

I do see the need to trap them automatically somehow so we get notified, and obviously having the tests fail is an easy way to do that.

My ideal solution - and I don't know if this is possible - would be to have our CI open an issue (or comment on a PR, whatever) when there are uncaught warnings and assign to devs. It looks to me like GH Actions can be used to open an issue (https://github.com/marketplace/actions/open-github-issue). Not sure about how we'd set that workflow up, but it avoids the appearance of SpacePy being "broken" when it's an upstream DeprecationWarning that doesn't immediately impact functionality or behaviour.

Subclassing TestCase to report a different status could help with a workflow like that.

Feb 01 '21 23:02 drsteve

I think it would be possible for each test job to parse the output and look for warnings, then pass the information out to the "all tests" job which could take action. We wouldn't want the test job itself to take the action since that would result in opening an issue for every Python version, dependency version, etc. There's a concept of "output" from the job that other jobs can look at and we could use that.

I'm thinking the right thing would be for a test to fail if it's a PR build, and pass but open an issue if it's a cron job build. The trick would be to not open an issue every single time the cron job runs. I don't know if we have access to, say, the status of the past cron run or something.

I'm not a big fan of doing things just for the optics (e.g. we've got a lot of open issues and I don't want to close just to close) but in practice if a build failure sends an email and we then open an issue "by hand," it probably makes more sense for the computer to do that work for us instead.

Feb 02 '21 00:02 jtniehof

From the upshot of #425, whatever course we take, we shouldn't do this on old versions of dependencies. Every Python version has an "oldest" and a "newest" dependency strategy in the CI...I suggest we ignore warnings on "oldest" and only care about it on "newest."

Feb 08 '21 21:02 jtniehof

spacepy spacepy copied to clipboard

Find uncaught warnings in CI

spacepy
spacepy copied to clipboard