sonic-telemetry
sonic-telemetry copied to clipboard
[Tentative for memleak][No merge]: Force OS mem release
First cut; Fixed leak in my repro; [ Done on commit that is in 202012 -- 1c3f75e0d2b635dcff7d1bdff5cc9a03ed4bf7c5 ]
./gnmi_cli --client_types=gnmi -alocalhost:50051 -q COUNTERS -logtostderr -insecure -timestamp on -t COUNTERS_DB -qt s -streaming_type ON_CHANGE
TODO: Try it with repro from Pradnya.
Changes:
- Force free mem in heap release back to OS
- Fix possible subroutine leak
- Cut down a log at level 1, which can be too much
- Print count of GO routines for monitoring purpose
Some Analysis: a) No leak of subscribe clients Matched count of initial poll queries with client shutdown message
grep -i -e "mode:[ ]*poll" x | tr -s " " | cut -f13 -d' ' | sort | uniq > poll.addr
½
admin@str-s6000-acs-14:~/data$ grep -i -e "mode:[ ]*poll" x | tr -s " " | cut -f13 -d' ' | wc -l
2959
admin@str-s6000-acs-14:~/data$ grep -i -e "mode:[ ]*poll" x | tr -s " " | cut -f13 -d' ' | sort | uniq | wc -l
2959
admin@str-s6000-acs-14:~/data$ grep -e "client_subscribe.go.*shutdown" x | tr -s " " | cut -f13 -d' ' |sort | uniq | wc -l
2955
b) No leak of Go routines: Looked at the count from logs. It increased during my repro run and went back to original + 1 after I stopped the client. Not sure about that +1
Feb 12 13:26:18.012516 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:18.011588 20 telemetry.go:140] Force mem release; numRoutine=18
Feb 12 13:26:19.713555 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:19.710227 20 db_client.go:1063] Force mem release; numRoutine=25
Feb 12 13:26:19.831333 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:19.823717 20 client_subscribe.go:268] Force mem release; numRoutine=26
Feb 12 13:26:19.933282 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:19.928824 20 db_client.go:1063] Force mem release; numRoutine=26
Feb 12 13:26:20.007562 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:20.005470 20 client_subscribe.go:268] Force mem release; numRoutine=26
Feb 12 13:26:20.069412 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:20.065920 20 client_subscribe.go:268] Force mem release; numRoutine=26
...
Feb 12 14:01:02.706342 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.704680 20 db_client.go:1063] Force mem release; numRoutine=26
Feb 12 14:01:02.767171 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.763155 20 db_client.go:1063] Force mem release; numRoutine=26
Feb 12 14:01:02.818160 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.815669 20 client_subscribe.go:268] Force mem release; numRoutine=23
Feb 12 14:01:02.869548 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.866421 20 db_client.go:1063] Force mem release; numRoutine=23
Feb 12 14:01:02.870280 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.866490 20 db_client.go:227] Force mem release; numRoutine=23
Feb 12 14:01:04.796773 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:04.795646 20 telemetry.go:140] Force mem release; numRoutine=19
c) Invalid connections go away quick
The subscribe RPC call create a gnmi_client/client_subscribe and call Run on it New gnmi-client from client_subscribe wait for subscribe request Creates DB client, if target is DB DB client, does fetch on table keys and this fails DB client terminates right there gnmi-client exits right away
Feb 12 00:19:38.480827 str-s6000-acs-14 INFO telemetry#/supervisord: telemetry I0212 00:19:38.478993 19 client_subscribe.go:97] Client 25.111.173.199:38610 recieved initial query subscribe:{prefix:{target:"STATE_DB"} subscription:{path:{elem:{name:"DOCKER_STATS"}} mode:SAMPLE sample_interval:30} subscription:{path:{elem:{name:"TEST_STATS"}} mode:SAMPLE sample_interval:30} subscription:{path:{elem:{name:"TEST_STATS2"}} mode:SAMPLE sample_interval:30} subscription:{path:{elem:{name:"TEST_STATS3"}} mode:SAMPLE sample_interval:30} subscription:{path:{elem:{name:"TEST_STATS4"}} mode:SAMPLE sample_interval:30} subscription:{path:{elem:{name:"TEST_STATS5"}} mode:SAMPLE sample_interval:30} subscription:{path:{elem:{name:"TEST_STATS6"}} mode:SAMPLE sample_interval:30} mode:POLL encoding:JSON_IETF}
Feb 12 00:19:38.482243 str-s6000-acs-14 INFO telemetry#/supervisord: telemetry I0212 00:19:38.481365 19 db_client.go:506] Invalid db table Path STATE_DB TEST_STATS
Feb 12 00:19:38.482243 str-s6000-acs-14 INFO telemetry#/supervisord: telemetry I0212 00:19:38.481503 19 client_subscribe.go:134] Client 25.111.173.199:38610 shutdown