sonic-telemetry icon indicating copy to clipboard operation
sonic-telemetry copied to clipboard

[Tentative for memleak][No merge]: Force OS mem release

Open renukamanavalan opened this issue 3 years ago • 0 comments

First cut; Fixed leak in my repro; [ Done on commit that is in 202012 -- 1c3f75e0d2b635dcff7d1bdff5cc9a03ed4bf7c5 ]

./gnmi_cli --client_types=gnmi -alocalhost:50051 -q COUNTERS -logtostderr -insecure -timestamp on -t COUNTERS_DB -qt s -streaming_type ON_CHANGE

TODO: Try it with repro from Pradnya.

Changes:

  1. Force free mem in heap release back to OS
  2. Fix possible subroutine leak
  3. Cut down a log at level 1, which can be too much
  4. Print count of GO routines for monitoring purpose

Some Analysis: a) No leak of subscribe clients Matched count of initial poll queries with client shutdown message

grep -i -e "mode:[ ]*poll" x | tr -s " " | cut -f13 -d' ' | sort | uniq > poll.addr
  ½
admin@str-s6000-acs-14:~/data$ grep -i -e "mode:[ ]*poll" x | tr -s " " | cut -f13 -d' ' | wc -l
2959
admin@str-s6000-acs-14:~/data$ grep -i -e "mode:[ ]*poll" x | tr -s " " | cut -f13 -d' ' | sort | uniq | wc -l
2959
admin@str-s6000-acs-14:~/data$ grep -e "client_subscribe.go.*shutdown" x | tr -s " " | cut -f13 -d' ' |sort | uniq | wc -l
2955

b) No leak of Go routines: Looked at the count from logs. It increased during my repro run and went back to original + 1 after I stopped the client. Not sure about that +1

Feb 12 13:26:18.012516 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:18.011588      20 telemetry.go:140] Force mem release; numRoutine=18
Feb 12 13:26:19.713555 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:19.710227      20 db_client.go:1063] Force mem release; numRoutine=25
Feb 12 13:26:19.831333 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:19.823717      20 client_subscribe.go:268] Force mem release; numRoutine=26
Feb 12 13:26:19.933282 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:19.928824      20 db_client.go:1063] Force mem release; numRoutine=26
Feb 12 13:26:20.007562 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:20.005470      20 client_subscribe.go:268] Force mem release; numRoutine=26
Feb 12 13:26:20.069412 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 13:26:20.065920      20 client_subscribe.go:268] Force mem release; numRoutine=26
...
Feb 12 14:01:02.706342 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.704680      20 db_client.go:1063] Force mem release; numRoutine=26
Feb 12 14:01:02.767171 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.763155      20 db_client.go:1063] Force mem release; numRoutine=26
Feb 12 14:01:02.818160 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.815669      20 client_subscribe.go:268] Force mem release; numRoutine=23
Feb 12 14:01:02.869548 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.866421      20 db_client.go:1063] Force mem release; numRoutine=23
Feb 12 14:01:02.870280 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:02.866490      20 db_client.go:227] Force mem release; numRoutine=23
Feb 12 14:01:04.796773 str-s6000-acs-12 INFO telemetry#/supervisord: telemetry I0212 14:01:04.795646      20 telemetry.go:140] Force mem release; numRoutine=19

c) Invalid connections go away quick

The subscribe RPC call create a gnmi_client/client_subscribe and call Run on it New gnmi-client from client_subscribe wait for subscribe request Creates DB client, if target is DB DB client, does fetch on table keys and this fails DB client terminates right there gnmi-client exits right away

Feb 12 00:19:38.480827 str-s6000-acs-14 INFO telemetry#/supervisord: telemetry I0212 00:19:38.478993      19 client_subscribe.go:97] Client 25.111.173.199:38610 recieved initial query subscribe:{prefix:{target:"STATE_DB"}  subscription:{path:{elem:{name:"DOCKER_STATS"}}  mode:SAMPLE  sample_interval:30}  subscription:{path:{elem:{name:"TEST_STATS"}}  mode:SAMPLE  sample_interval:30}  subscription:{path:{elem:{name:"TEST_STATS2"}}  mode:SAMPLE  sample_interval:30}  subscription:{path:{elem:{name:"TEST_STATS3"}}  mode:SAMPLE  sample_interval:30}  subscription:{path:{elem:{name:"TEST_STATS4"}}  mode:SAMPLE  sample_interval:30}  subscription:{path:{elem:{name:"TEST_STATS5"}}  mode:SAMPLE  sample_interval:30}  subscription:{path:{elem:{name:"TEST_STATS6"}}  mode:SAMPLE  sample_interval:30}  mode:POLL  encoding:JSON_IETF}
Feb 12 00:19:38.482243 str-s6000-acs-14 INFO telemetry#/supervisord: telemetry I0212 00:19:38.481365      19 db_client.go:506] Invalid db table Path STATE_DB TEST_STATS
Feb 12 00:19:38.482243 str-s6000-acs-14 INFO telemetry#/supervisord: telemetry I0212 00:19:38.481503      19 client_subscribe.go:134] Client 25.111.173.199:38610 shutdown

renukamanavalan avatar Feb 12 '22 14:02 renukamanavalan