checkmk icon indicating copy to clipboard operation
checkmk copied to clipboard

Fix Huawei WLC AP Parsing crash on offline APs

Open throjaisnn opened this issue 1 year ago • 2 comments

General information

Affected Device: Huawei AC6508 Wireless LAN Controller (WLC) Software Version: Huawei NMS Version 2.23.00.100.932 CheckMK Version: MSP 2.3.0p12 Issue Summary: The Huawei AC6508 WLC crashes when an offline Access Point (AP) is reported by the controller. The crash occurs due to an out-of-range access attempt on the aps_info2 list when the AP is no longer available, resulting in a system failure.

Bug reports

Operating System: Debian 12 running Checkmk MSP 2.3.0p12

Steps to Reproduce:

  • Run a device discovery on a Huawei AC6508 WLC Controller with disconnected (offline) Access Points.
  • Run the attached string_table against the function parse_huawei_wlc_aps test.txt

SNMPWalk Following File contains a Full SNMPWalk which is redacted. Huawei AC6508 WLC snmpwalk.txt

Crash Report ID: ID: 48274138-87a8-11ef-a33a-0050568fc548

Proposed changes

Expected Behavior: When an AP goes offline, CheckMK should handle it gracefully, avoiding any out-of-range memory access. The system should log appropriate warnings or errors rather than attempting to access data from the unavailable AP.

Proposed Patch Change: The patch modifies the AP handling logic to skip further parsing of AP information when the AP is not available. This prevents out-of-range access and stabilizes the system.

throjaisnn avatar Oct 11 '24 09:10 throjaisnn

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

github-actions[bot] avatar Oct 11 '24 09:10 github-actions[bot]

I have read the CLA Document and I hereby sign the CLA or my organization already has a signed CLA.

throjaisnn avatar Oct 11 '24 09:10 throjaisnn

Hi @throjaisnn, Thanks for creating the pull request! We have a question with regards how the filtering of all offline APs affects Checkmk services that monitored APs that were previously online? Could you describe what happens to a service after an AP that was online then goes offline?

msprdctmgr avatar Feb 28 '25 11:02 msprdctmgr

Hi @msproductmanager As soon there was an offline AP on the Huawei Controller no AP could be monitored anymore and the CheckMK Monitoring Module was failing. I don't have access to a similar system anymore but you should this exact behaviour in the crash report.

throjaisnn avatar Mar 13 '25 07:03 throjaisnn

Hi @throjaisnn thank you for your response. Unfortunately, we cannot merge the code as it is because it modifies the function argument in place. Therefore, we will close this pull request without merging it. If possible, please suggest another solution that does not change the function argument. We also need some context on how this would affect the behavior of the services created for the APs that went offline.

msprdctmgr avatar Mar 13 '25 14:03 msprdctmgr