synse-server icon indicating copy to clipboard operation
synse-server copied to clipboard

restarted plugin is discoverable but fails to connect and get put back into the active state

Open edaniszewski opened this issue 4 years ago • 6 comments

This was discovered when updating a plugin ConfigMap, where the Deployment was configured to restart on config change in order to pick up the config.

It looks like Synse failed to connect to the plugin, which makes sense if the container went down temporarily while restarting

timestamp='2020-04-07T16:07:25.459012Z' logger='synse_server.plugin' level='info' event='error on plugin context exit' exc_type=<class 'grpc._channel._MultiThreadedRendezvous'> exc_val=<_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"created":"@1586275645.458472337","description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":3941,"referenced_errors":[{"created":"@1586275638.040326950","description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":393,"grpc_status":14}]}"
> exc_tb=<traceback object at 0x7f523816b288> id='59aa69eb-a5da-569f-aaa4-3ebbdb72530b'
timestamp='2020-04-07T16:07:25.459193Z' logger='synse_server.plugin' level='info' event='marking plugin as inactive' id='59aa69eb-a5da-569f-aaa4-3ebbdb72530b' tag='vaporio/modbus-ip'

the issue is that it never seems to be able to reconnect. This is running 3.0.0-alpha.19, which should have the changes to create a reconnect background task when a plugin is marked inactive (need to verify this, but I am pretty confident it is true), but I did not see any logs indicating that reconnect happened. Additionally, it should have eventually reconnected/been marked active during the periodic plugin refresh. It looks like it was marked active, but immediately afterwards fails to connect again.

timestamp='2020-04-07T17:08:53.457246Z' logger='synse_server.plugin' level='info' event='registering new plugin' addr='xx.xx.xx.xx:5004' protocol='tcp'
timestamp='2020-04-07T17:08:53.457355Z' logger='synse_server.plugin' level='debug' event='application metrics enabled: registering gRPC interceptor'
timestamp='2020-04-07T17:08:53.459808Z' logger='synse_server.plugin' level='debug' event='loaded plugin info' id='59aa69eb-a5da-569f-aaa4-3ebbdb72530b' version={'plugin_version': '3.0.0-alpha.7', 'sdk_version': '3.0.0-alpha.3', 'build_date': '2020-02-24T15:35:49Z', 'git_commit': 'ce0da27', 'git_tag': '3.0.0-alpha.7', 'arch': 'amd64', 'os': 'linux'} addr='xx.xx.xx.xx:5004' tag='vaporio/modbus-ip'
timestamp='2020-04-07T17:08:53.459918Z' logger='synse_server.plugin' level='debug' event='plugin with id already registered - skipping' id='59aa69eb-a5da-569f-aaa4-3ebbdb72530b'
timestamp='2020-04-07T17:08:53.460010Z' logger='synse_server.plugin' level='info' event='marking plugin as active' id='59aa69eb-a5da-569f-aaa4-3ebbdb72530b' tag='vaporio/modbus-ip'
timestamp='2020-04-07T17:08:53.460109Z' logger='synse_server.plugin' level='info' event='registering new plugin' addr='10.42.4.64:5003' protocol='tcp'
timestamp='2020-04-07T17:08:53.460202Z' logger='synse_server.plugin' level='debug' event='application metrics enabled: registering gRPC interceptor'
timestamp='2020-04-07T17:08:53.462390Z' logger='synse_server.plugin' level='debug' event='loaded plugin info' id='d0c746de-832e-52b7-bc38-51115117a84d' version={'plugin_version': '3.0.0-alpha.7', 'sdk_version': '3.0.0-alpha.3', 'build_date': '2020-02-05T16:51:57Z', 'git_commit': '8620ab6', 'git_tag': '3.0.0-alpha.7', 'arch': 'amd64', 'os': 'linux'} addr='10.42.4.64:5003' tag='vaporio/snmp'
timestamp='2020-04-07T17:08:53.462501Z' logger='synse_server.plugin' level='debug' event='plugin with id already registered - skipping' id='d0c746de-832e-52b7-bc38-51115117a84d'
timestamp='2020-04-07T17:08:53.462608Z' logger='synse_server.plugin' level='debug' event='plugin manager refresh complete' plugin_count=9 elapsed_time=0.07009196281433105
timestamp='2020-04-07T17:08:55.083351Z' logger='synse_server.api.http' level='debug' event='processing request' request_id='KxyYkCXy5paZgAVQs4okcB' method='GET' ip='10.193.6.3' path='/test'
timestamp='2020-04-07T17:08:55.083833Z' logger='synse_server.cmd.test' level='info' event='issuing command' request_id='KxyYkCXy5paZgAVQs4okcB' command='TEST'
timestamp='2020-04-07T17:08:55.084812Z' logger='sanic.access' level='info' event='' extra={'status': 200, 'byte': 58, 'host': '10.193.6.3:49748', 'request': 'GET http://10.42.0.27:5000/test'}
timestamp='2020-04-07T17:08:55.614599Z' logger='synse_server.api.http' level='debug' event='processing request' request_id='NHF3oYJGb2NxutuiSMC4xa' method='GET' ip='10.42.2.24' path='/test'
timestamp='2020-04-07T17:08:55.615034Z' logger='synse_server.cmd.test' level='info' event='issuing command' request_id='NHF3oYJGb2NxutuiSMC4xa' command='TEST'
timestamp='2020-04-07T17:08:55.615962Z' logger='sanic.access' level='info' event='' extra={'status': 200, 'byte': 58, 'host': '10.42.2.24:54538', 'request': 'GET http://10.42.0.27:5000/test'}
timestamp='2020-04-07T17:08:55.617963Z' logger='synse_server.api.http' level='debug' event='processing request' request_id='xtztP9vw7gsX8eE7U9FXJn' method='GET' ip='10.42.2.24' path='/v3/readcache' params={'start': ['2020-04-07T17:08:50Z']}
timestamp='2020-04-07T17:08:55.618698Z' logger='synse_server.cmd.read' level='info' event='issuing command' command='READ CACHE' start='2020-04-07T17:08:50Z' end=''
timestamp='2020-04-07T17:08:55.619088Z' logger='synse_server.cmd.read' level='debug' event='getting cached readings for plugin' plugin='vaporio/modbus-ip' command='READ CACHE'
timestamp='2020-04-07T17:08:55.702634Z' logger='synse_server.cmd.read' level='debug' event='getting cached readings for plugin' plugin='vaporio/i2c' command='READ CACHE'
timestamp='2020-04-07T17:08:55.705890Z' logger='synse_server.cmd.read' level='debug' event='getting cached readings for plugin' plugin='vaporio/i2c' command='READ CACHE'
timestamp='2020-04-07T17:08:55.710092Z' logger='synse_server.cmd.read' level='debug' event='getting cached readings for plugin' plugin='vaporio/i2c' command='READ CACHE'
timestamp='2020-04-07T17:08:55.716640Z' logger='synse_server.cmd.read' level='debug' event='getting cached readings for plugin' plugin='vaporio/modbus-ip' command='READ CACHE'
timestamp='2020-04-07T17:08:58.718346Z' logger='synse_server.plugin' level='info' event='error on plugin context exit' exc_type=<class 'grpc._channel._MultiThreadedRendezvous'> exc_val=<_MultiThreadedRendezvous of RPC that terminated with:
	status = StatusCode.DEADLINE_EXCEEDED
	details = "Deadline Exceeded"
	debug_error_string = "{"created":"@1586279338.717792605","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":4}"
> exc_tb=<traceback object at 0x7f523a6bbbc8> id='59aa69eb-a5da-569f-aaa4-3ebbdb72530b'
timestamp='2020-04-07T17:08:58.718714Z' logger='synse_server.plugin' level='info' event='marking plugin as inactive' id='59aa69eb-a5da-569f-aaa4-3ebbdb72530b' tag='vaporio/modbus-ip'
timestamp='2020-04-07T17:08:58.718967Z' logger='synse_server.api.http' level='error' event='failure when streaming cached readings' exception='Traceback (most recent call last):\n  File "/usr/local/lib/python3.6/site-packages/synse_server/cmd/read.py", line 192, in read_cache\n    for reading in client.read_cache(start=start, end=end):\n  File "/usr/local/lib/python3.6/site-packages/synse_grpc/client.py", line 209, in read_cache\n    errors.wrap_and_raise(e)\n  File "/usr/local/lib/python3.6/site-packages/synse_grpc/errors.py", line 35, in wrap_and_raise\n    raise exception\n  File "/usr/local/lib/python3.6/site-packages/synse_grpc/client.py", line 206, in read_cache\n    for reading in self.client.ReadCache(request, timeout=self.timeout):\n  File "/usr/local/lib/python3.6/site-packages/synse_server/metrics.py", line 219, in wrap_stream_resp\n    for item in response:\n  File "/usr/local/lib/python3.6/site-packages/grpc/_channel.py", line 416, in __next__\n    return self._next()\n  File "/usr/local/lib/python3.6/site-packages/grpc/_channel.py", line 706, in _next\n    raise self\ngrpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:\n\tstatus = StatusCode.DEADLINE_EXCEEDED\n\tdetails = "Deadline Exceeded"\n\tdebug_error_string = "{"created":"@1586279338.717792605","description":"Deadline Exceeded","file":"src/core/ext/filters/deadline/deadline_filter.cc","file_line":69,"grpc_status":4}"\n>\n\nThe above exception was the direct cause of the following exception:\n\nTraceback (most recent call last):\n  File "/usr/local/lib/python3.6/site-packages/synse_server/api/http.py", line 402, in response_streamer\n    async for reading in cmd.read_cache(start, end):\n  File "/usr/local/lib/python3.6/site-packages/synse_server/cmd/read.py", line 197, in read_cache\n    ) from e\nsynse_server.errors.ServerError: error while issuing gRPC request: read cache'

So it seems like we may be hanging onto a stale connection or something. That avenue is at least worth investigating a bit.

edaniszewski avatar Apr 07 '20 18:04 edaniszewski