reframe icon indicating copy to clipboard operation
reframe copied to clipboard

feature request: notification of test maintainer on failure

Open brandongc opened this issue 5 years ago • 9 comments

We would like to have optional notifications sent to the maintainer of a test when the test fails.

@rfm.simple_test
class mytest(rfm.RegressionTest):
    def __init__(self):
        super().__init__()
        self.descr = "A Simple test"
        self.valid_systems = ['cori:knl',
                              'gerty:knl']
        self.valid_prog_environs = ['PrgEnv-intel']
        self.maintainers = ['[email protected]']
        self.notify_maintainers = True

or maybe

self.maintainers = [{'email': '[email protected]', 'notify' : True}, '[email protected]']

where non-dict items are treated as "notify" : False.

Is the logging framework the right place to develop something like this? e.g.

logging_config = {
    'level': 'DEBUG',
    'handlers': [
        {
            'type': 'email',
            'level': 'ERROR',
            'format': '[%(asctime)s] %(levelname)s: '
                      '%(check_info)s: %(message)s',
        }
    ]
}

brandongc avatar Feb 06 '20 21:02 brandongc

@brandongc that’s a valid feature request. I’m not sure yet what would be the best place to implement this. Do you want the notifications to be controlled per test or just globally?

vkarak avatar Feb 07 '20 17:02 vkarak

@brandongc I see this mostly as a separate or alternative reporting step at the end rather than part of the logging. Logging was conceived mainly for logging framework's activities rather than doing check-specific work or notifications.

vkarak avatar Feb 11 '20 13:02 vkarak

I think per test controls make the most sense. e.g. I might be "maintainer" of several tests, but only want to be notified for some subset of them. @bcfriesen what do you think?

brandongc avatar Feb 17 '20 00:02 brandongc

Another argument towards a separate test result notification component is that a site might also like to open automatically issues to a JIRA board in case of failures.

vkarak avatar Feb 18 '20 09:02 vkarak

Yes I think per-test control is the appropriate granularity.

How much work would it be to implement a performance test failure vs some other "binary" failure? (like compilation failed or something)

bcfriesen avatar Feb 18 '20 18:02 bcfriesen

@bcfriesen There is no difference in implementing binary or non binary notifications. All this information is available to the framework and this is what it reports. Adding notifications should be another type of reporting or a reporting step. Implementation-wise shouldn’t be a big thing, because it is peripheral to the core framework. It just needs to be designed such that is extensible to other types of notifications.

vkarak avatar Feb 18 '20 19:02 vkarak

I think we also want to make sure this is controllable, maybe via a CLI opt-in option like --notifications=on? I don't want to get spammed by people using reframe to test their work on our development systems.

Another thought is to keep it extensible beyond email. One possibility is to also consider webhooks to enable things like Slack integration.

And finally I wonder what context could/should be included in a notification beyond "$test failed".

brandongc avatar Jul 08 '20 18:07 brandongc

Hi @brandongc

I think we also want to make sure this is controllable, maybe via a CLI opt-in option like --notifications=on? I don't want to get spammed by people using reframe to test their work on our development systems.

That's pretty straightforward now in ReFrame 3.0. We can combine command line options with configuration parameters and environment variables very easily, so such a feature could easily enabled via different ways.

Another thought is to keep it extensible beyond email. One possibility is to also consider webhooks to enable things like Slack integration.

That's a valid thing to take into account. Being able to open JIRA tickets directly for example would also be nice as a future extension.

And finally I wonder what context could/should be included in a notification beyond "$test failed".

Regarding this, we're working in #1377 in generating a json report for the full run (and as a result for each test). This contains all test-related information, so that could be sent as well and perhaps create a summary of it as the notification body. In fact, when this is merged (hopefully for 3.1), you could have any custom external tool reading that json and sending out notifications at will.

vkarak avatar Jul 08 '20 18:07 vkarak

Oh, and most probably this feature request will be postponed to 3.2 due to lack of bandwidth currently :-D

vkarak avatar Jul 08 '20 18:07 vkarak