nav icon indicating copy to clipboard operation
nav copied to clipboard

Add support for Juniper CHASSIS and SYSTEM alerts

Open hmpf opened this issue 2 years ago • 10 comments

Closes #2358

hmpf avatar Apr 06 '22 09:04 hmpf

Codecov Report

Merging #2388 (204d406) into master (33b5913) will increase coverage by 0.08%. The diff coverage is 100.00%.

:exclamation: Current head 204d406 differs from pull request most recent head dcff55b. Consider uploading reports for the commit dcff55b to get more accurate results

@@            Coverage Diff             @@
##           master    #2388      +/-   ##
==========================================
+ Coverage   54.52%   54.60%   +0.08%     
==========================================
  Files         558      560       +2     
  Lines       40644    40709      +65     
==========================================
+ Hits        22160    22231      +71     
+ Misses      18484    18478       -6     
Impacted Files Coverage Δ
python/nav/ipdevpoll/plugins/juniperalarm.py 100.00% <100.00%> (ø)
python/nav/mibs/juniper_alarm_mib.py 100.00% <100.00%> (ø)

... and 2 files with indirect coverage changes

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

codecov[bot] avatar Apr 06 '22 09:04 codecov[bot]

Test results

     12 files       12 suites   11m 21s :stopwatch: 3 256 tests 3 160 :heavy_check_mark:   96 :zzz: 0 :x: 9 243 runs  8 955 :heavy_check_mark: 288 :zzz: 0 :x:

Results for commit dcff55b5.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Apr 06 '22 09:04 github-actions[bot]

Currently, if a netbox has a non-zero count of red or yellow alarms, a start-event is sent. If there is a zero-count an end-event is sent. There is no checking whether a state is already open and there should be, and there is no checking of whether the specific netbox has the mib in question.

Also, tests needed.

hmpf avatar Apr 07 '22 10:04 hmpf

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Apr 08 '22 10:04 sonarqubecloud[bot]

The actual count could possibly be stored together the event with the help of EventQueueVar. Any good examples where this is done?

hmpf avatar Apr 08 '22 10:04 hmpf

eventengine will by-design ignore the end-events that appear without a corresponding start-event having been posted first

Cool, how convenient!

hmpf avatar Jun 16 '22 13:06 hmpf

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Jun 16 '22 13:06 sonarqubecloud[bot]

The actual count could possibly be stored together the event with the help of EventQueueVar. Any good examples where this is done?

Whether this example is good could be debatable, but here is once instance of setting arbitrary event variables through the "varmap" (line 160 should be highlighted):

https://github.com/Uninett/nav/blob/e6634e512c8ecf283c85a701366620e724806ab7/python/nav/ipdevpoll/shadows/gwpeers.py#L147-L171

There are two issues that would make it difficult to come to an ideal solution:

  1. EventQueueVars aren't automatically carried over to the corresponding alerthist entries that eventengine generates (though I think they are copied into the alert queue - however, alert queue entries represent notifications and are removed once notifications are sent).

Usually, if you want to carry arbitrary variables over to the permanent record of alerthist/AlertHistory, you need to write an event handler plugin that does so explicitly. Currently, I think perhaps the only plugin that does so is the event plugin for maintenance events. which does it here:

https://github.com/Uninett/nav/blob/e6634e512c8ecf283c85a701366620e724806ab7/python/nav/eventengine/plugins/maintenancestate.py#L42

  1. Secondly, a state is a state in NAV, there isn't really a mechanism to add more events or information to an existing alerthist state. So, if the alert count changes over time (but remains non-zero), there isn't really an effective way to update an existing "juniper red alert non-zero" state, it will just go down as "oh, here's a duplicate start-event, I'll throw it away". You might, however, be able to add some magic by implementing an eventengine plugin for your new event type.

So, presently, you can generate an alert when the "red count" transitions from 0 to 1, and this alert can say "there's 1 red alert". However, when the counter subsequently transitions from 1 to 2, there is no way to notify the NAV user that "there are now 2 red alerts". Again, this is analysis is from memory. Unless it is already possible, we could jig event engine to be able to override handling of "duplicate" events in a custom plugin.

lunkwill42 avatar Jun 17 '22 07:06 lunkwill42

So, presently, you can generate an alert when the "red count" transitions from 0 to 1, and this alert can say "there's 1 red alert". However, when the counter subsequently transitions from 1 to 2, there is no way to notify the NAV user that "there are now 2 red alerts". Again, this is analysis is from memory. Unless it is already possible, we could jig event engine to be able to override handling of "duplicate" events in a custom plugin.

The maintenanceState plugin already suggests that we could work around the "duplicate" handling:

https://github.com/Uninett/nav/blob/e6634e512c8ecf283c85a701366620e724806ab7/python/nav/eventengine/plugins/maintenancestate.py#L43-L46

This means that we could potentially detect a change in the red/green alert count, update the existing alert history state and send an extra notification. However, there still is no good way to maintain a history/log of the changing red/green alert count over time. Maybe storing a current value and a maximum value as alerthistvars? It might be time for a fuller design discussion with the CNaaS team who wanted this feature :)

lunkwill42 avatar Jun 17 '22 08:06 lunkwill42

It might be time for a fuller design discussion with the CNaaS team who wanted this feature :)

So I did have a short discussion with @knutvi on this. I'm adding our conclusion to the original issue #2358.

lunkwill42 avatar Jun 17 '22 12:06 lunkwill42

I can confirm that every five minutes we get two logging messages about ignoring an end event for each netbox.

johannaengland avatar Jan 12 '23 07:01 johannaengland

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Feb 01 '23 12:02 sonarqubecloud[bot]

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Mar 23 '23 10:03 sonarqubecloud[bot]

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Jun 01 '23 10:06 sonarqubecloud[bot]