ntc-templates icon indicating copy to clipboard operation
ntc-templates copied to clipboard

fortinet_get_system_ha_status.textfsm parse error for non "OK" HA Health Status

Open pnpestov opened this issue 10 months ago • 1 comments

ISSUE TYPE
  • Template Issue with error and raw data
TEMPLATE USING

FG Version: 5.6, 6.0, 6.2, 6.4

HW : varied

Value HA_HEALTH (\S+) Value MODEL (\S+) Value HA_MODE ([\S\s]+) Value HA_GROUP (\S+) Value CLUSTER_UPTIME ([\S\s]+) Value CLUSTER_STATE_CHANGED_TIME ([\S\s]+) Value HA_SESSION_PICKUP_STATUS (\S+) Value HA_SESSION_PICKUP_DELAY (\S+) Value HA_OVERRIDE_STATUS (\S+) Value HA_MASTER_UNIT_NAME (\S+) Value HA_SLAVE_UNIT_NAME (\S+) Value HA_MASTER_UNIT_SERIAL (\S+) Value HA_SLAVE_UNIT_SERIAL (\S+) Value HA_MASTER_UNIT_INDEX (\S+) Value HA_SLAVE_UNIT_INDEX (\S+)

Start ^HA\s+Health\s+Status:\s+${HA_HEALTH} ^Model:\s+${MODEL} ^Mode:\s+${HA_MODE} ^Group:\s+${HA_GROUP} ^Debug:\s+\d+ ^Cluster\s+Uptime:\s+${CLUSTER_UPTIME} ^Cluster\s+state\s+change\s+time:\s+${CLUSTER_STATE_CHANGED_TIME} ^(Master|Primary)\s+selected\s+using: ^\s*<\S+ ^ses_pickup:\s+${HA_SESSION_PICKUP_STATUS},\s+ses_pickup_delay=${HA_SESSION_PICKUP_DELAY} ^override:\s+${HA_OVERRIDE_STATUS} ^Configuration\s+Status: -> Configuration_Status

Catch old 6.0_noha with no "Configuraton Status"

^System\s+Usage\s+stats: -> System_Usage_stats ^. -> Error "in-Start"

Configuration_Status ^System\s+Usage\s+stats: -> System_Usage_stats ^\s*\S+([\S\s]+):\s\S+$$ ^. -> Error "in-Configuration_Status"

System_Usage_stats ^HBDEV\s+stats: -> HBDEV_MONDEV_stats ^\s*\S+([\S\s]+):$$ #^\s*\S+:\s+ ^\s*sessions= ^. -> Error "in-System_Usage_stats"

HBDEV_MONDEV_stats

Combine stats, no MONDEV in older FW's

^\s*\S+([\S\s]+):$$ ^\s*\S+:\s.+rx.+tx.+$$ ^MONDEV\s+stats: ^(Master|Primary)\s*:\s+${HA_MASTER_UNIT_NAME}\s*,\s+${HA_MASTER_UNIT_SERIAL},\s+(HA\s+cluster\s+index|cluster\s+index)\s+=\s+${HA_MASTER_UNIT_INDEX} ^(Slave|Secondary)\s*:\s+${HA_SLAVE_UNIT_NAME}\s*,\s+${HA_SLAVE_UNIT_SERIAL},\s+(|HA)\scluster\s+index\s+=\s+${HA_SLAVE_UNIT_INDEX} ^number\s+of\s+vcluster:\s+\d+ ^vcluster\s+\d+: ^(Master|Slave|Primary|Secondary)\s:\s+\S+,\s+(operating\s+cluster\s+index|HA\s+operating\s+index)\s+=\s+\d+ -> Record ^\s*$$ ^. -> Error "in-HBDEV_MONDEV_stats"

SAMPLE COMMAND OUTPUT

HA Health Status: WARNING: FGT40FYYYYYYYYYY has mondev down; Model: FortiGate-40F Mode: HA A-P Group: 172 Debug: 0 Cluster Uptime: 63 days 22:15:42 Cluster state change time: 2024-02-11 15:25:27 Primary selected using: <2024/02/11 15:25:27> FGT40FXXXXXXXXXX is selected as the primary because the value 0 of link-failure + pingsvr-failure is less than peer member FGT40FYYYYYYYYYY. ses_pickup: enable, ses_pickup_delay=disable override: enable Configuration Status: FGT40FXXXXXXXXXX(updated 0 seconds ago): in-sync FGT40FYYYYYYYYYY(updated 0 seconds ago): in-sync System Usage stats: FGT40FXXXXXXXXXX(updated 0 seconds ago): sessions=768, average-cpu-user/nice/system/idle=0%/0%/0%/99%, memory=35% FGT40FYYYYYYYYYY(updated 0 seconds ago): sessions=634, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=31% HBDEV stats: FGT40FXXXXXXXXXX(updated 0 seconds ago): lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=9997131732/27616386/0/0, tx=10080077920/27616652/0/0 lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=11772621099/36693306/0/0, tx=26151306122/60128423/0/0 FGT40FYYYYYYYYYY(updated 0 seconds ago): lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=10080077920/27616652/0/0, tx=9997131732/27616386/0/0 lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=26151777728/60128423/0/0, tx=11771044717/36693306/0/0 MONDEV stats: FGT40FXXXXXXXXXX(updated 0 seconds ago): lan1: physical/100auto, up, rx-bytes/packets/dropped/errors=535463275509/3388288017/0/0, tx=3023591767050/4114831127/0/0 wan: physical/100auto, up, rx-bytes/packets/dropped/errors=3314385262333/4439875482/0/0, tx=768352772861/3445252569/0/0 FGT40FYYYYYYYYYY(updated 0 seconds ago): lan1: physical/00, down, rx-bytes/packets/dropped/errors=0/0/0/0, tx=0/0/0/0 wan: physical/100auto, up, rx-bytes/packets/dropped/errors=15792718293/245544650/0/0, tx=0/0/0/0 Primary : FGT-fw-a, FGT40FXXXXXXXXXX, HA cluster index = 1 Secondary : FGT-fw-b, FGT40FYYYYYYYYYY, HA cluster index = 0 number of vcluster: 1 vcluster 1: work 169.254.0.2 Primary: FGT40FXXXXXXXXXX, HA operating index = 0 Secondary: FGT40FYYYYYYYYYY, HA operating index = 1

SUMMARY

Traceback (most recent call last): File "C:\Users\Admin\Scripts_py\Netmiko\fortinet.py", line 16, in command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 61, in parse_output cli_table.ParseCmd(data, attrs) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd self.table = self._ParseCmdItem(self.raw, template_file=template_files[0]) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem for record in fsm.ParseText(cmd_input): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText self._CheckLine(line) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine if self._Operations(rule, line): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.' textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 36. Input Line: HA Health Status: .

STEPS TO REPRODUCE

Reproduce non "OK" HA Health Status in two lines. For example, disable the lan1 (HA Monitor Interface) work link on the slave node.

EXPECTED RESULTS

Get the current value of HA Health Status parsed_sample:

  • ha_health: "WARNING: FGT40FYYYYYYYYYY has mondev down" and continue executing the script
ACTUAL RESULTS
Traceback (most recent call last):
  File "C:\Users\Admin\Scripts_py\Netmiko\fortinet.py", line 16, in <module>
    command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 61, in parse_output
    cli_table.ParseCmd(data, attrs)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd
    self.table = self._ParseCmdItem(self.raw, template_file=template_files[0])
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem
    for record in fsm.ParseText(cmd_input):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText
    self._CheckLine(line)
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine
    if self._Operations(rule, line):
  File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations
    raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.'
textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 36. Input Line: HA Health Status: .

pnpestov avatar Apr 15 '24 11:04 pnpestov

Sorry, I forgot to specify the firmware version Version: FortiGate-40F v7.0.13,build0566,231024 (GA.M)

pnpestov avatar Apr 15 '24 11:04 pnpestov

@pnpestov If I'm not mistaken the code blocks preserve trailing whitespace and what not. Plus in general the raw cli output text is just easier to read.

If you're open to building up changes and submitting a pull request (PR) that's the best way to see to it these bugfixes get merged in.

mjbear avatar Jul 27 '24 00:07 mjbear

@pnpestov Once I looked at the template I realized why the first post on this thread had the markdown it did. A code block would have been extremely helpful to prevent that. :grinning:

I used the raw output (I put it in a code block below) that was in the first post to work against (in hopes no whitespace/formatting was lost). :man_shrugging:

Ultimately for my solution: I state transitioned, captured the "warning" line using the trailing semicolon ; as a regex anchor (required or things get weird), captured Model and used that line to state transition back to Start. :sweat_smile:

:tada: I end up with the following structured output:

---
parsed_sample:
  - cluster_state_changed_time: "2024-02-11 15:25:27"
    cluster_uptime: "63 days 22:15:42"
    ha_group: "172"
    ha_health: "WARNING: FGT40FYYYYYYYYYY has mondev down"
    ha_master_unit_index: "1" 
    ha_master_unit_name: "FGT-fw-a"
    ha_master_unit_serial: "FGT40FXXXXXXXXXX"
    ha_mode: "HA A-P"
    ha_override_status: "enable"
    ha_session_pickup_delay: "disable"
    ha_session_pickup_status: "enable"
    ha_slave_unit_index: "0" 
    ha_slave_unit_name: "FGT-fw-b"
    ha_slave_unit_serial: "FGT40FYYYYYYYYYY"
    model: "FortiGate-40F"

Raw output from first post:

HA Health Status:
WARNING: FGT40FYYYYYYYYYY has mondev down;
Model: FortiGate-40F
Mode: HA A-P
Group: 172
Debug: 0
Cluster Uptime: 63 days 22:15:42
Cluster state change time: 2024-02-11 15:25:27
Primary selected using:
<2024/02/11 15:25:27> FGT40FXXXXXXXXXX is selected as the primary because the value 0 of link-failure + pingsvr-failure is less than peer member FGT40FYYYYYYYYYY.
ses_pickup: enable, ses_pickup_delay=disable
override: enable
Configuration Status:
FGT40FXXXXXXXXXX(updated 0 seconds ago): in-sync
FGT40FYYYYYYYYYY(updated 0 seconds ago): in-sync
System Usage stats:
FGT40FXXXXXXXXXX(updated 0 seconds ago):
sessions=768, average-cpu-user/nice/system/idle=0%/0%/0%/99%, memory=35%
FGT40FYYYYYYYYYY(updated 0 seconds ago):
sessions=634, average-cpu-user/nice/system/idle=0%/0%/0%/100%, memory=31%
HBDEV stats:
FGT40FXXXXXXXXXX(updated 0 seconds ago):
lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=9997131732/27616386/0/0, tx=10080077920/27616652/0/0
lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=11772621099/36693306/0/0, tx=26151306122/60128423/0/0
FGT40FYYYYYYYYYY(updated 0 seconds ago):
lan2: physical/1000auto, up, rx-bytes/packets/dropped/errors=10080077920/27616652/0/0, tx=9997131732/27616386/0/0
lan3: physical/1000auto, up, rx-bytes/packets/dropped/errors=26151777728/60128423/0/0, tx=11771044717/36693306/0/0
MONDEV stats:
FGT40FXXXXXXXXXX(updated 0 seconds ago):
lan1: physical/100auto, up, rx-bytes/packets/dropped/errors=535463275509/3388288017/0/0, tx=3023591767050/4114831127/0/0
wan: physical/100auto, up, rx-bytes/packets/dropped/errors=3314385262333/4439875482/0/0, tx=768352772861/3445252569/0/0
FGT40FYYYYYYYYYY(updated 0 seconds ago):
lan1: physical/00, down, rx-bytes/packets/dropped/errors=0/0/0/0, tx=0/0/0/0
wan: physical/100auto, up, rx-bytes/packets/dropped/errors=15792718293/245544650/0/0, tx=0/0/0/0
Primary : FGT-fw-a, FGT40FXXXXXXXXXX, HA cluster index = 1
Secondary : FGT-fw-b, FGT40FYYYYYYYYYY, HA cluster index = 0
number of vcluster: 1
vcluster 1: work 169.254.0.2
Primary: FGT40FXXXXXXXXXX, HA operating index = 0
Secondary: FGT40FYYYYYYYYYY, HA operating index = 1

mjbear avatar Jul 28 '24 16:07 mjbear

@pnpestov Submitted PR #1791

mjbear avatar Jul 28 '24 17:07 mjbear

Good time of day! Thanks for your reply! But such a situation is also possible: ftg-fw-a # get sys ha status HA Health Status: WARNING: FGTXXXXXXXXXXXX has hbdev down; WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P ...

Error: Traceback (most recent call last): File "C:\Users\Admin\Scripts_py\Netmiko\fortinet.py", line 127, in command_parsed = parse_output(platform = "fortinet", command = "get system ha status", data = output) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\ntc_templates\parse.py", line 77, in parse_output cli_table.ParseCmd(data, attrs) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 282, in ParseCmd self.table = self._ParseCmdItem(self.raw, template_file=template_files[0]) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\clitable.py", line 315, in _ParseCmdItem for record in fsm.ParseText(cmd_input): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 895, in ParseText self._CheckLine(line) File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 944, in _CheckLine if self._Operations(rule, line): File "C:\Users\Admin\AppData\Roaming\Python\Python310\site-packages\textfsm\parser.py", line 1021, in _Operations raise TextFSMError('Error: %s. Rule Line: %s. Input Line: %s.' textfsm.parser.TextFSMError: Error: "in-Start". Rule Line: 37. Input Line: WARNING: FGTХХХХХХХХХХХХХ has hbdev down; .

pnpestov avatar Sep 30 '24 19:09 pnpestov

Good time of day! Thanks for your reply! But such a situation is also possible: ftg-fw-a # get sys ha status HA Health Status: WARNING: FGTXXXXXXXXXXXX has hbdev down; WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P ...

Only two months after the PR was merged in, haha. :wink:

@pnpestov Please open up a new issue ticket to cover the bases here.

If you provide the :point_right: full raw output (albeit sanitized of any private details, ex: serial numbers) on that new issue I'll take a look at working up changes for this. Unless you want to work up a PR.

mjbear avatar Sep 30 '24 20:09 mjbear

@pnpestov I ran this through textfsm and there doesn't appear to be anything wrong with the template (see below) based on the snippet of raw output you provided above.

Please check that you're using a current version (or git clone) or ntc-templates. Thank you.

[
	{
		"CLUSTER_STATE_CHANGED_TIME": "",
		"CLUSTER_UPTIME": "",
		"HA_GROUP": "",
		"HA_HEALTH": "WARNING: FGTYYYYYYYYYYYYY has hbdev down",
		"HA_MASTER_UNIT_INDEX": "",
		"HA_MASTER_UNIT_NAME": "",
		"HA_MASTER_UNIT_SERIAL": "",
		"HA_MODE": "HA A-P",
		"HA_OVERRIDE_STATUS": "",
		"HA_SESSION_PICKUP_DELAY": "",
		"HA_SESSION_PICKUP_STATUS": "",
		"HA_SLAVE_UNIT_INDEX": "",
		"HA_SLAVE_UNIT_NAME": "",
		"HA_SLAVE_UNIT_SERIAL": "",
		"MODEL": "FortiGate-40F"
	}
]

mjbear avatar Sep 30 '24 20:09 mjbear

@mjbear Thanks for the prompt response! I'm using the current version of ntc-templates. I noticed that the github editor removes the whitespace characters before WARNING. It turns out that you are conducting a test with an incorrect output. In fact, the output in the CLI is as follows:

HA Health Status:     WARNING: FGTXXXXXXXXXXXXX has hbdev down;     WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P

I looked at the template handler, most likely the error occurs just because of the whitespace characters.

pnpestov avatar Oct 01 '24 08:10 pnpestov

@mjbear Thanks for the prompt response! I'm using the current version of ntc-templates. I noticed that the github editor removes the whitespace characters before WARNING. It turns out that you are conducting a test with an incorrect output. In fact, the output in the CLI is as follows:

Most welcome. I can say with complete certainty the development for PR #1791 was not from the GitHub editor, but instead my local OS.

:bulb: Ah it was the output from this thread that had the white space stripped. Should have used code blocks. (Oh well, things happen, it's ok.)

HA Health Status: WARNING: FGTXXXXXXXXXXXXX has hbdev down; WARNING: FGTYYYYYYYYYYYYY has hbdev down; Model: FortiGate-40F Mode: HA A-P

I looked at the template handler, most likely the error occurs just because of the whitespace characters.

:dart: Would you mind performing the following steps: :question:

  1. Gather the full output from get system ha status
  2. Open a new issue ticket
  3. Place that (sanitized) raw output in a code block (by using the <> icon or triple backticks ```) within the new issue ticket

I'd be glad to complete this fix once and for all provided I have full output and everything requested.

mjbear avatar Oct 01 '24 14:10 mjbear

Yes, of course! New issue ticket - https://github.com/networktocode/ntc-templates/issues/1859

pnpestov avatar Oct 01 '24 15:10 pnpestov