[sflow]During the de init sequence sometimes hsflow throws error log when it accesses interface after removal
Description
This is a statistical issue. Sometimes very rarely during the de initialization sequence sflow accesses the net device after it is removed. We notice the error log as seen below
Apr 24 06:31:03.969346 r-leopard-56 INFO container: docker cmd: wait for syncd
Apr 24 06:31:03.970278 r-leopard-56 INFO container: docker cmd: stop for syncd
Apr 24 06:31:03.971268 r-leopard-56 DEBUG container: container_stop: END
Apr 24 06:31:04.021434 r-leopard-56 NOTICE root: Stopped syncd service...
Apr 24 06:31:04.251818 r-leopard-56 INFO kernel: [ 4653.759491] sxd_kernel: sx_netdev_cleanup
Apr 24 06:31:05.898447 r-leopard-56 ERR sflow#hsflowd: device Ethernet0 Get SIOCGIFFLAGS failed : No such device
Apr 24 06:31:06.919803 r-leopard-56 INFO kernel: [ 4656.426494] sxd_kernel: sx_core_cleanup_module
Apr 24 06:31:06.919837 r-leopard-56 NOTICE kernel: [ 4656.426516] sxd_kernel: remove a PCI core device
Apr 24 06:31:06.987789 r-leopard-56 NOTICE kernel: [ 4656.498085] sxd_kernel: sending UDEV event from sx-core driver [bus=0000:07:00.0, action=1 (KOBJ_REMOVE), origin='Unregistering from PCI']
Steps to reproduce the issue:
- Enable sflow
- Reboot the device
Describe the results you received:
Error in syslog from hsflowd
Describe the results you expected:
No Errors is syslog
Output of show version:
SONiC Software Version: SONiC.202311_RC.36-b4a12122b_Internal
SONiC OS Version: 11
Distribution: Debian 11.9
Kernel: 5.10.0-23-2-amd64
Build commit: ba494d16b
Build date: Tue Apr 23 09:42:43 UTC 2024
Built by: sw-r2d2-bot@r-build-sonic-ci03-241
Platform: x86_64-mlnx_msn4700-r0
HwSKU: ACS-MSN4700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2112X06507
Model Number: MSN4700-WS2F_QP1
Hardware Revision: A1
Uptime: 06:33:25 up 1:19, 2 users, load average: 1.84, 1.64, 1.09
Date: Wed 24 Apr 2024 06:33:25
Docker images:
REPOSITORY TAG IMAGE ID SIZE
docker-dhcp-relay latest 79dabcb0b517 324MB
docker-macsec latest cd38fc575b2b 344MB
docker-syncd-mlnx 202311_RC.36-b4a12122b_Internal fdbf9a0e79d5 833MB
docker-syncd-mlnx latest fdbf9a0e79d5 833MB
docker-eventd 202311_RC.36-b4a12122b_Internal 4a1e1cc515c0 315MB
docker-eventd latest 4a1e1cc515c0 315MB
docker-orchagent 202311_RC.36-b4a12122b_Internal ff702081b437 353MB
docker-orchagent latest ff702081b437 353MB
docker-platform-monitor 202311_RC.36-b4a12122b_Internal 5f39748bd0c2 821MB
docker-platform-monitor latest 5f39748bd0c2 821MB
docker-fpm-frr 202311_RC.36-b4a12122b_Internal 528c74f8fdf3 373MB
docker-fpm-frr latest 528c74f8fdf3 373MB
docker-snmp 202311_RC.36-b4a12122b_Internal a221a19ca9c5 354MB
docker-snmp latest a221a19ca9c5 354MB
docker-nat 202311_RC.36-b4a12122b_Internal 35631a74dbb7 345MB
docker-nat latest 35631a74dbb7 345MB
docker-teamd 202311_RC.36-b4a12122b_Internal 82b8463837da 342MB
docker-teamd latest 82b8463837da 342MB
docker-sflow 202311_RC.36-b4a12122b_Internal 7830164eeff3 343MB
docker-sflow latest 7830164eeff3 343MB
docker-lldp 202311_RC.36-b4a12122b_Internal d6b51144e1f8 357MB
docker-lldp latest d6b51144e1f8 357MB
docker-mux 202311_RC.36-b4a12122b_Internal c58904286de1 364MB
docker-mux latest c58904286de1 364MB
docker-database 202311_RC.36-b4a12122b_Internal de759d0d14f2 315MB
docker-database latest de759d0d14f2 315MB
docker-router-advertiser 202311_RC.36-b4a12122b_Internal 44ff307880f2 315MB
docker-router-advertiser latest 44ff307880f2 315MB
docker-sonic-gnmi 202311_RC.36-b4a12122b_Internal 0a2f64672aaa 403MB
docker-sonic-gnmi latest 0a2f64672aaa 403MB
docker-sonic-mgmt-framework 202311_RC.36-b4a12122b_Internal fe3538c89a79 417MB
docker-sonic-mgmt-framework latest fe3538c89a79 417MB
Output of show techsupport:
(paste your output here or download and attach the file here )
Additional information you deem important (e.g. issue happens only occasionally):
@Gokulnath-Raja @jeff-yin @sflow FYI
That does look like a harmless race, so perhaps it should be downgraded from ERR to INFO level. Would that solve the problem from your point of view?
(I see that in other, similar circumstances we usually only log something if debugging is enabled).
That does look like a harmless race, so perhaps it should be downgraded from ERR to INFO level. Would that solve the problem from your point of view?
(I see that in other, similar circumstances we usually only log something if debugging is enabled).
I do agree that this is an harmless log. However this might trigger alert when seen in production systems. I agree we can change the level to INFO or even NOTICE
This change has been committed to the host-flow master branch: https://github.com/sflow/host-sflow/commit/e4689c674b4390681dd8ef725a8d0473abbf0572