sonic-buildimage icon indicating copy to clipboard operation
sonic-buildimage copied to clipboard

[sflow]During the de init sequence sometimes hsflow throws error log when it accesses interface after removal

Open dgsudharsan opened this issue 1 year ago • 4 comments

Description

This is a statistical issue. Sometimes very rarely during the de initialization sequence sflow accesses the net device after it is removed. We notice the error log as seen below

Apr 24 06:31:03.969346 r-leopard-56 INFO container: docker cmd: wait for syncd
Apr 24 06:31:03.970278 r-leopard-56 INFO container: docker cmd: stop for syncd
Apr 24 06:31:03.971268 r-leopard-56 DEBUG container: container_stop: END
Apr 24 06:31:04.021434 r-leopard-56 NOTICE root: Stopped syncd service...
Apr 24 06:31:04.251818 r-leopard-56 INFO kernel: [ 4653.759491] sxd_kernel: sx_netdev_cleanup
Apr 24 06:31:05.898447 r-leopard-56 ERR sflow#hsflowd: device Ethernet0 Get SIOCGIFFLAGS failed : No such device
Apr 24 06:31:06.919803 r-leopard-56 INFO kernel: [ 4656.426494] sxd_kernel: sx_core_cleanup_module
Apr 24 06:31:06.919837 r-leopard-56 NOTICE kernel: [ 4656.426516] sxd_kernel: remove a PCI core device
Apr 24 06:31:06.987789 r-leopard-56 NOTICE kernel: [ 4656.498085] sxd_kernel: sending UDEV event from sx-core driver [bus=0000:07:00.0, action=1 (KOBJ_REMOVE), origin='Unregistering from PCI']

Steps to reproduce the issue:

  1. Enable sflow
  2. Reboot the device

Describe the results you received:

Error in syslog from hsflowd

Describe the results you expected:

No Errors is syslog

Output of show version:

SONiC Software Version: SONiC.202311_RC.36-b4a12122b_Internal
SONiC OS Version: 11
Distribution: Debian 11.9
Kernel: 5.10.0-23-2-amd64
Build commit: ba494d16b
Build date: Tue Apr 23 09:42:43 UTC 2024
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Platform: x86_64-mlnx_msn4700-r0
HwSKU: ACS-MSN4700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT2112X06507
Model Number: MSN4700-WS2F_QP1
Hardware Revision: A1
Uptime: 06:33:25 up  1:19,  2 users,  load average: 1.84, 1.64, 1.09
Date: Wed 24 Apr 2024 06:33:25

Docker images:
REPOSITORY                                         TAG                               IMAGE ID       SIZE
docker-dhcp-relay                                  latest                            79dabcb0b517   324MB
docker-macsec                                      latest                            cd38fc575b2b   344MB
docker-syncd-mlnx                                  202311_RC.36-b4a12122b_Internal   fdbf9a0e79d5   833MB
docker-syncd-mlnx                                  latest                            fdbf9a0e79d5   833MB
docker-eventd                                      202311_RC.36-b4a12122b_Internal   4a1e1cc515c0   315MB
docker-eventd                                      latest                            4a1e1cc515c0   315MB
docker-orchagent                                   202311_RC.36-b4a12122b_Internal   ff702081b437   353MB
docker-orchagent                                   latest                            ff702081b437   353MB
docker-platform-monitor                            202311_RC.36-b4a12122b_Internal   5f39748bd0c2   821MB
docker-platform-monitor                            latest                            5f39748bd0c2   821MB
docker-fpm-frr                                     202311_RC.36-b4a12122b_Internal   528c74f8fdf3   373MB
docker-fpm-frr                                     latest                            528c74f8fdf3   373MB
docker-snmp                                        202311_RC.36-b4a12122b_Internal   a221a19ca9c5   354MB
docker-snmp                                        latest                            a221a19ca9c5   354MB
docker-nat                                         202311_RC.36-b4a12122b_Internal   35631a74dbb7   345MB
docker-nat                                         latest                            35631a74dbb7   345MB
docker-teamd                                       202311_RC.36-b4a12122b_Internal   82b8463837da   342MB
docker-teamd                                       latest                            82b8463837da   342MB
docker-sflow                                       202311_RC.36-b4a12122b_Internal   7830164eeff3   343MB
docker-sflow                                       latest                            7830164eeff3   343MB
docker-lldp                                        202311_RC.36-b4a12122b_Internal   d6b51144e1f8   357MB
docker-lldp                                        latest                            d6b51144e1f8   357MB
docker-mux                                         202311_RC.36-b4a12122b_Internal   c58904286de1   364MB
docker-mux                                         latest                            c58904286de1   364MB
docker-database                                    202311_RC.36-b4a12122b_Internal   de759d0d14f2   315MB
docker-database                                    latest                            de759d0d14f2   315MB
docker-router-advertiser                           202311_RC.36-b4a12122b_Internal   44ff307880f2   315MB
docker-router-advertiser                           latest                            44ff307880f2   315MB
docker-sonic-gnmi                                  202311_RC.36-b4a12122b_Internal   0a2f64672aaa   403MB
docker-sonic-gnmi                                  latest                            0a2f64672aaa   403MB
docker-sonic-mgmt-framework                        202311_RC.36-b4a12122b_Internal   fe3538c89a79   417MB
docker-sonic-mgmt-framework                        latest                            fe3538c89a79   417MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

dgsudharsan avatar Apr 25 '24 20:04 dgsudharsan

@Gokulnath-Raja @jeff-yin @sflow FYI

dgsudharsan avatar Apr 25 '24 20:04 dgsudharsan

That does look like a harmless race, so perhaps it should be downgraded from ERR to INFO level. Would that solve the problem from your point of view?

(I see that in other, similar circumstances we usually only log something if debugging is enabled).

sflow avatar Apr 26 '24 00:04 sflow

That does look like a harmless race, so perhaps it should be downgraded from ERR to INFO level. Would that solve the problem from your point of view?

(I see that in other, similar circumstances we usually only log something if debugging is enabled).

I do agree that this is an harmless log. However this might trigger alert when seen in production systems. I agree we can change the level to INFO or even NOTICE

dgsudharsan avatar Apr 26 '24 00:04 dgsudharsan

This change has been committed to the host-flow master branch: https://github.com/sflow/host-sflow/commit/e4689c674b4390681dd8ef725a8d0473abbf0572

sflow avatar Apr 26 '24 20:04 sflow