SONiC
SONiC copied to clipboard
Mellanox SN2700 SoNIC docker services fail to start, HwSKU "None" causes python error
Salutations!
We are attempting to run SoNIC on a Mellanox SN2700 switch. Several of the docker services fail to start. With my limited troubleshooting ability, I believe I have discerned that the HwSKU is not being properly detected. Other posts and discussions I have found indicate it might be old firmware to blame, but without access to an MLNX-OS .bin
file, I can't switch over to that OS an perform a firmware update. Please correct me if I am wrong, but my understanding is that MLNX-OS is the only way to update the firmware on these devices.
Is there something else wrong, perhaps? Thanks for any assistance in advance! Please let me know if there is any more information I can provide for clarity.
show techsupport
dump located here (expires Dec 3, 2022).
admin@sonic:~$ show version
SONiC Software Version: SONiC.master.168762-a31a4e7f8
Distribution: Debian 11.5
Kernel: 5.10.0-12-2-amd64
Build commit: a31a4e7f8
Build date: Wed Nov 2 17:48:12 UTC 2022
Built by: AzDevOps@sonic-build-workers-002BS0
Platform: x86_64-mlnx_x86-r5.0.1410
HwSKU: None
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1702K06506
Model Number: MSN2700-CS2F
Hardware Revision: A2
Uptime: 16:20:34 up 16:51, 2 users, load average: 0.17, 0.25, 0.18
Date: Thu 03 Nov 2022 16:20:34
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-syncd-mlnx latest 7a7677abf201 867MB
docker-syncd-mlnx master.168762-a31a4e7f8 7a7677abf201 867MB
docker-platform-monitor latest c6a6b6ac28c4 875MB
docker-platform-monitor master.168762-a31a4e7f8 c6a6b6ac28c4 875MB
docker-orchagent latest ae126e856887 486MB
docker-orchagent master.168762-a31a4e7f8 ae126e856887 486MB
docker-fpm-frr latest 0ded2429da04 498MB
docker-fpm-frr master.168762-a31a4e7f8 0ded2429da04 498MB
docker-teamd latest c1ddc6677e3b 468MB
docker-teamd master.168762-a31a4e7f8 c1ddc6677e3b 468MB
docker-macsec latest 7c7c3b31165f 470MB
docker-dhcp-relay latest 7b8a8e3ae7bd 461MB
docker-eventd latest e655bf03eeb0 451MB
docker-eventd master.168762-a31a4e7f8 e655bf03eeb0 451MB
docker-sonic-p4rt latest f072348333dd 534MB
docker-sonic-p4rt master.168762-a31a4e7f8 f072348333dd 534MB
docker-snmp latest 9938d819ece8 498MB
docker-snmp master.168762-a31a4e7f8 9938d819ece8 498MB
docker-database latest 420d50b4ee8a 452MB
docker-database master.168762-a31a4e7f8 420d50b4ee8a 452MB
docker-sonic-telemetry latest 8616add6b988 746MB
docker-sonic-telemetry master.168762-a31a4e7f8 8616add6b988 746MB
docker-router-advertiser latest 88c779b21304 452MB
docker-router-advertiser master.168762-a31a4e7f8 88c779b21304 452MB
docker-mux latest 324ad018c755 500MB
docker-mux master.168762-a31a4e7f8 324ad018c755 500MB
docker-lldp latest 3385e2edd2cc 494MB
docker-lldp master.168762-a31a4e7f8 3385e2edd2cc 494MB
docker-nat latest 9444b720dd96 439MB
docker-nat master.168762-a31a4e7f8 9444b720dd96 439MB
docker-sflow latest 4e13ae56f727 437MB
docker-sflow master.168762-a31a4e7f8 4e13ae56f727 437MB
docker-sonic-mgmt-framework latest c21d7367e9a1 570MB
docker-sonic-mgmt-framework master.168762-a31a4e7f8 c21d7367e9a1 570MB
admin@sonic:~$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5f2575f29394 docker-sonic-telemetry:latest "/usr/local/bin/supe…" 17 hours ago Up 17 hours telemetry
fad615c4f175 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 17 hours ago Up 17 hours mgmt-framework
f2fb3164424c docker-lldp:latest "/usr/bin/docker-lld…" 17 hours ago Up 17 hours lldp
f952e79283dc docker-platform-monitor:latest "/usr/bin/docker_ini…" 17 hours ago Up 17 hours pmon
ea8619a54edc docker-router-advertiser:latest "/usr/bin/docker-ini…" 2 months ago Up 2 months radv
168008e39980 docker-eventd:latest "/usr/local/bin/supe…" 2 months ago Up 2 months eventd
33cff425ea3b docker-database:latest "/usr/local/bin/dock…" 2 months ago Up 2 months database
admin@sonic:~$ show platform syseeprom
TlvInfo Header:
Id String: TlvInfo
Version: 1
Total Length: 584
TLV Name Code Len Value
---------------- ------ ----- ------
Product Name 0x21 64 MSN2700
Part Number 0x22 20 MSN2700-CS2F
Serial Number 0x23 24 MT1702K06506
Base MAC Address 0x24 6 24:8A:07:85:49:00
Manufacture Date 0x25 19 01/12/2017 14:29:58
Device Version 0x26 1 0
Platform Name 0x28 64 x86_64-mlnx_x86-r0
ONIE Version 0x29 32 5.0.1404
MAC Addresses 0x2A 2 128
Manufacturer 0x2B 8 Mellanox
admin@sonic:~$ sudo tail -n 50 /var/log/syslog
Nov 3 16:22:14.004926 sonic NOTICE systemd[1]: hostcfgd.service: Main process exited, code=exited, status=1/FAILURE
Nov 3 16:22:14.005107 sonic WARNING systemd[1]: hostcfgd.service: Failed with result 'exit-code'.
Nov 3 16:22:14.009191 sonic INFO systemd[1]: Started Host config enforcer daemon.
Nov 3 16:22:14.009533 sonic NOTICE systemd[1]: switch state service is not active.
Nov 3 16:22:14.009653 sonic WARNING systemd[1]: Dependency failed for SNMP container.
Nov 3 16:22:14.009756 sonic NOTICE systemd[1]: snmp.service: Job snmp.service/start failed with result 'dependency'.
Nov 3 16:22:14.012826 sonic NOTICE systemd[1]: switch state service is not active.
Nov 3 16:22:14.012988 sonic WARNING systemd[1]: Dependency failed for SNMP container.
Nov 3 16:22:14.013091 sonic NOTICE systemd[1]: snmp.service: Job snmp.service/start failed with result 'dependency'.
Nov 3 16:22:14.300857 sonic INFO hostcfgd: ConfigDB connect success
Nov 3 16:22:14.313485 sonic INFO hostcfgd[78909]: Traceback (most recent call last):
Nov 3 16:22:14.313613 sonic INFO hostcfgd[78909]: File "/usr/local/bin/hostcfgd", line 1678, in <module>
Nov 3 16:22:14.314269 sonic INFO hostcfgd[78909]: main()
Nov 3 16:22:14.314367 sonic INFO hostcfgd[78909]: File "/usr/local/bin/hostcfgd", line 1673, in main
Nov 3 16:22:14.314964 sonic INFO hostcfgd[78909]: daemon = HostConfigDaemon()
Nov 3 16:22:14.315189 sonic INFO hostcfgd[78909]: File "/usr/local/bin/hostcfgd", line 1466, in __init__
Nov 3 16:22:14.315573 sonic INFO hostcfgd[78909]: self.feature_handler = FeatureHandler(self.config_db, feature_state_table, self.device_config)
Nov 3 16:22:14.315797 sonic INFO hostcfgd[78909]: File "/usr/local/bin/hostcfgd", line 202, in __init__
Nov 3 16:22:14.316121 sonic INFO hostcfgd[78909]: self._device_running_config = device_info.get_device_runtime_metadata()
Nov 3 16:22:14.316307 sonic INFO hostcfgd[78909]: File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 478, in get_device_runtime_metadata
Nov 3 16:22:14.316485 sonic INFO hostcfgd[78909]: port_metadata = {'ETHERNET_PORTS_PRESENT': True if get_path_to_port_config_file(hwsku=None, asic="0" if is_multi_npu() else None) else False}
Nov 3 16:22:14.316666 sonic INFO hostcfgd[78909]: File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 299, in get_path_to_port_config_file
Nov 3 16:22:14.317551 sonic INFO hostcfgd[78909]: (platform_path, hwsku_path) = get_paths_to_platform_and_hwsku_dirs()
Nov 3 16:22:14.317846 sonic INFO hostcfgd[78909]: File "/usr/local/lib/python3.9/dist-packages/sonic_py_common/device_info.py", line 265, in get_paths_to_platform_and_hwsku_dirs
Nov 3 16:22:14.318044 sonic INFO hostcfgd[78909]: hwsku_path = os.path.join(platform_path, hwsku)
Nov 3 16:22:14.318237 sonic INFO hostcfgd[78909]: File "/usr/lib/python3.9/posixpath.py", line 90, in join
Nov 3 16:22:14.318418 sonic INFO hostcfgd[78909]: genericpath._check_arg_types('join', a, *p)
Nov 3 16:22:14.318642 sonic INFO hostcfgd[78909]: File "/usr/lib/python3.9/genericpath.py", line 152, in _check_arg_types
Nov 3 16:22:14.318833 sonic INFO hostcfgd[78909]: raise TypeError(f'{funcname}() argument must be str, bytes, or '
Nov 3 16:22:14.319018 sonic INFO hostcfgd[78909]: TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneType'
Nov 3 16:22:14.348945 sonic NOTICE systemd[1]: hostcfgd.service: Main process exited, code=exited, status=1/FAILURE
Nov 3 16:22:14.349139 sonic WARNING systemd[1]: hostcfgd.service: Failed with result 'exit-code'.
Nov 3 16:22:14.351433 sonic WARNING systemd[1]: hostcfgd.service: Start request repeated too quickly.
Nov 3 16:22:14.351563 sonic WARNING systemd[1]: hostcfgd.service: Failed with result 'exit-code'.
Nov 3 16:22:14.351660 sonic ERR systemd[1]: Failed to start Host config enforcer daemon.
Nov 3 16:22:14.351760 sonic NOTICE systemd[1]: switch state service is not active.
Nov 3 16:22:14.351870 sonic WARNING systemd[1]: Dependency failed for SNMP container.
Nov 3 16:22:14.351964 sonic NOTICE systemd[1]: snmp.service: Job snmp.service/start failed with result 'dependency'.
The SwSS is not active, you may want to check the SyncD docker..
Can you share docker ps -a output?
Thanks.
Here is the output of docker ps --all
:
admin@sonic:~$ docker ps --all
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5f2575f29394 docker-sonic-telemetry:latest "/usr/local/bin/supe…" 19 hours ago Up 19 hours telemetry
fad615c4f175 docker-sonic-mgmt-framework:latest "/usr/local/bin/supe…" 19 hours ago Up 19 hours mgmt-framework
f2fb3164424c docker-lldp:latest "/usr/bin/docker-lld…" 19 hours ago Up 19 hours lldp
f952e79283dc docker-platform-monitor:latest "/usr/bin/docker_ini…" 19 hours ago Up 19 hours pmon
ea8619a54edc docker-router-advertiser:latest "/usr/bin/docker-ini…" 2 months ago Up 2 months radv
168008e39980 docker-eventd:latest "/usr/local/bin/supe…" 2 months ago Up 2 months eventd
33cff425ea3b docker-database:latest "/usr/local/bin/dock…" 2 months ago Up 2 months database
And here's a grep for occurrences of syncd
in the syslog:
admin@sonic:~$ sudo cat /var/log/syslog | grep syncd
Nov 3 14:10:05.769254 sonic ERR monit[453]: 'container_checker' status failed (3) -- Expected containers not running: mux, snmp, dhcp_relay, syncd, swss, teamd, bgp
Nov 3 14:10:06.806861 sonic NOTICE python3: :- publish: EVENT_PUBLISHED: {"sonic-events-host:event-down-ctr":{"ctr_name":"syncd","timestamp":"2022-11-03T14:10:06.806710Z"}}
Nov 3 14:11:05.861950 sonic ERR monit[453]: 'container_checker' status failed (3) -- Expected containers not running: swss, teamd, dhcp_relay, mux, snmp, bgp, syncd
Nov 3 14:11:06.405897 sonic NOTICE python3: :- publish: EVENT_PUBLISHED: {"sonic-events-host:event-down-ctr":{"ctr_name":"syncd","timestamp":"2022-11-03T14:11:06.405217Z"}}
Nov 3 14:12:05.893813 sonic ERR monit[453]: 'container_checker' status failed (3) -- Expected containers not running: mux, syncd, swss, teamd, dhcp_relay, bgp, snmp
Nov 3 14:12:06.444846 sonic NOTICE python3: :- publish: EVENT_PUBLISHED: {"sonic-events-host:event-down-ctr":{"ctr_name":"syncd","timestamp":"2022-11-03T14:12:06.444652Z"}}
Nov 3 14:13:05.925553 sonic ERR monit[453]: 'container_checker' status failed (3) -- Expected containers not running: snmp, swss, teamd, syncd, bgp, mux, dhcp_relay
Nov 3 14:13:06.516344 sonic NOTICE python3: :- publish: EVENT_PUBLISHED: {"sonic-events-host:event-down-ctr":{"ctr_name":"syncd","timestamp":"2022-11-03T14:13:06.515439Z"}}
...
Yes, Looks like all the dockers are not running fine. You may want to get a "tested/stable" image from Mellanox Switch Support team. All the essential dockers are crashing.
SAI talks to syncD so technically, anything in the SAI could be the problem.
Can you check BIOS version with dmidecode? I had problems with running sonic on SN2700 but BIOS update to 2018 version solved issues (at least sonic doesn't complain now that platform is not supported)