st2 icon indicating copy to clipboard operation
st2 copied to clipboard

Polling Sensor token has expired.

Open guzzijones opened this issue 4 years ago • 5 comments

I created a polling sensor that uses a datastore key as a queue. The poll interval is set to 10 seconds It makes a call to the datastore service to check a key for values. As new values are added this sensor will remove them as they age.

I setup an alert to notify myself when the sensor fails and I received this today:

2020-04-29 15:42:39,191 139660794638416 AUDIT access [-] Access granted to "sensor_service" with the token set to expire at "2020-04-30T15:42:39.189539Z". (username='sensor_service',token_expiration='202
2020-04-29 15:42:39,224 139660794638416 INFO intel_alien_sensor [-] last pull time: 2020-04-29T03:42:38.460618+00:00
2020-04-29 15:42:39,294 139660794638416 INFO intel_alien_sensor [-] count:3347
2020-04-29 15:42:40,094 139660794638416 INFO intel_alien_sensor [-] lmintel.intelAlienSensor no events found
2020-04-29 15:42:43,081 140467950239824 INFO rate_limit_queue [-] checking queues
2020-04-29 15:42:43,081 140467950239824 INFO keyvalue [-] checking rate limit queues
2020-04-29 15:42:43,090 140467950239824 WARNING sensor_wrapper [-] Sensor "RateLimitQueue" run method raised an exception: 401 Client Error: Unauthorized
MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
Traceback (most recent call last):
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2reactor/container/sensor_wrapper.py", line 229, in run
    self._sensor_instance.run()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2reactor/sensor/base.py", line 119, in run
    self.poll()
  File "/opt/stackstorm/packs/lmutil27/sensors/rate_limit_queue.py", line 41, in poll
    self._client.check_todo()
  File "/opt/stackstorm/virtualenvs/lmutil27/lib/python2.7/site-packages/stackrlclient/keyvalue.py", line 314, in check_todo
    for i in self.all_todo():
  File "/opt/stackstorm/virtualenvs/lmutil27/lib/python2.7/site-packages/stackrlclient/keyvalue.py", line 271, in all_todo
    return  self._all_keyvals(RLClient.TODO)
  File "/opt/stackstorm/virtualenvs/lmutil27/lib/python2.7/site-packages/stackrlclient/keyvalue.py", line 264, in _all_keyvals
    keyvals = self.client.keys.query_with_count(prefix=prefix_str)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2client/models/core.py", line 41, in decorate
    return func(*args, **kwargs)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2client/models/core.py", line 285, in query_with_count
    instances, response = self._query_details(**kwargs)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2client/models/core.py", line 273, in _query_details
    self.handle_error(response)
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2client/models/core.py", line 167, in handle_error
    response.raise_for_status()
  File "/opt/stackstorm/st2/lib/python2.7/site-packages/requests/models.py", line 940, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
HTTPError: 401 Client Error: Unauthorized
MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo
2020-04-29 15:42:43,091 140467950239824 ERROR (unknown file) [-] Traceback (most recent call last):

2020-04-29 15:42:43,092 140467950239824 ERROR (unknown file) [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2reactor/container/sensor_wrapper.py", line 361, in <module>

2020-04-29 15:42:43,092 140467950239824 ERROR (unknown file) [-]
2020-04-29 15:42:43,092 140467950239824 ERROR (unknown file) [-] obj.run()

2020-04-29 15:42:43,092 140467950239824 ERROR (unknown file) [-]   File "/opt/stackstorm/st2/lib/python2.7/site-packages/st2reactor/container/sensor_wrapper.py", line 235, in run

2020-04-29 15:42:43,092 140467950239824 ERROR (unknown file) [-]
2020-04-29 15:42:43,092 140467950239824 ERROR (unknown file) [-] raise Exception(msg)

2020-04-29 15:42:43,093 140467950239824 ERROR (unknown file) [-] Exception
2020-04-29 15:42:43,093 140467950239824 ERROR (unknown file) [-] :
2020-04-29 15:42:43,093 140467950239824 ERROR (unknown file) [-] Sensor "RateLimitQueue" run method raised an exception: 401 Client Error: Unauthorized
MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
2020-04-29 15:42:43,093 140467950239824 ERROR (unknown file) [-]

2020-04-29 15:42:43,093 140467950239824 INFO sensor_wrapper [-] Stopping trigger watcher
2020-04-29 15:42:43,094 140467950239824 INFO sensor_wrapper [-] Invoking cleanup on sensor
2020-04-29 15:42:43,244 140255981185616 INFO process_container [-] Process for sensor lmutil27.RateLimitQueue has exited with code 1
2020-04-29 15:42:45,793 140255980311728 WARNING api [-] "auth.api_url" configuration option is not configured
2020-04-29 15:42:46,632 139688247312464 INFO sensor_wrapper [-] Found config for sensor "RateLimitQueue"
2020-04-29 15:42:46,632 139688247312464 INFO sensor_wrapper [-] Watcher started
2020-04-29 15:42:46,632 139688247312464 INFO sensor_wrapper [-] Running sensor initialization code
2020-04-29 15:42:46,633 139688247312464 INFO sensor_wrapper [-] Running sensor in active mode (poll interval=10s)
2020-04-29 15:42:46,633 139688247312464 INFO rate_limit_queue [-] checking queues
2020-04-29 15:42:46,633 139688247312464 INFO keyvalue [-] checking rate limit queues
2020-04-29 15:42:46,646 139688193976112 INFO mixins [-] Connected to amqp://guest:**@127.0.0.1:5672//

guzzijones avatar Apr 29 '20 15:04 guzzijones

It is happening every day.
I am guessing the default behavior of sensors is to let the token expire and restart the service? I do not have service_token_ttl set in my st2.conf so it is using 86400 seconds.

st2sensorcontainer.log:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo
st2sensorcontainer.log:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200425.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200425.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo
st2sensorcontainer.log-20200425.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200426.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200426.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo
st2sensorcontainer.log-20200426.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200427.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200427.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo
st2sensorcontainer.log-20200427.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200428.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.
st2sensorcontainer.log-20200428.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo
st2sensorcontainer.log-20200428.gz:MESSAGE: Unauthorized - Token has expired. for url: http://127.0.0.1:9101/v1/keys/?prefix=ratelimit2731795c-e948-11e9-81b4-2a2ae2dbcce4%3Atodo.

guzzijones avatar Apr 29 '20 16:04 guzzijones

https://github.com/StackStorm/st2/blob/9edc3fb6da7f12515f6011960f1b72538e53368e/st2common/st2common/services/datastore.py#L325 It looks like the token should automatically renew according to this documentation.

guzzijones avatar Jun 01 '20 18:06 guzzijones

Polling sensor runs in an infinite loop. Thus once the token is sent in to the service the sensor never stops and refreshes its token. https://github.com/StackStorm/st2/blob/9edc3fb6da7f12515f6011960f1b72538e53368e/st2reactor/st2reactor/sensor/base.py#L117

guzzijones avatar Jun 01 '20 18:06 guzzijones

Somewhere in this method we need to detect if the sensor start time is longer than the config.service_token_ttl - 30 seconds. If so we need to stop this sensor and start it again. https://github.com/StackStorm/st2/blob/9edc3fb6da7f12515f6011960f1b72538e53368e/st2reactor/st2reactor/container/process_container.py#L161

guzzijones avatar Jun 01 '20 19:06 guzzijones

Is this issue still relevant? My polling sensors that get key/values still manage to get a token when it's expired. From what I can see the sensor service's get_api_key will get a new key if its expired: https://github.com/StackStorm/st2/blob/419b25da9f45a92fb72549d9c4094e4cb7c6d729/st2common/st2common/services/datastore.py#L341

amanda11 avatar May 23 '22 11:05 amanda11

I'm seeing this, currently, in 3.7.0.

mamercad avatar Jan 11 '23 19:01 mamercad

Thanks for confirming @mamercad. BTW you can remove the https://github.com/StackStorm/st2/labels/status%3Ato%20be%20verified label!

arm4b avatar Jan 11 '23 20:01 arm4b

Also was impacted by this today. Restarting the sensor "fixed" the issue, but would rather have a better option that a nightly restart or something along those lines. Version 3.8.0

maxfactor1 avatar Feb 02 '23 22:02 maxfactor1

just fyi, I have abandoned sensors. main pitfalls:

  1. There is no way to query stackstorm for sensor state.
  2. sensors do not continue to try to restart and die silently after 2 tries.

You can get the same feature using cron rule and an action.

guzzijones avatar Feb 14 '23 21:02 guzzijones

Included in the base packs are rules/actions about sensors stopping.

I generally agree, though. I've had to bake in some logic to have sensors "check in" because occasionally they have died in the past and not restarted, and then created monitoring externally around that.

GitFromTheInternet avatar Feb 14 '23 22:02 GitFromTheInternet

sensors do not continue to try to restart and die silently after 2 tries.

@guzzijones BTW in ST2-K8s deployment repo there is an option of running one sensor per container. So if the sensor dies, - container too and so K8s handles the restart/recovery mechanism and it's also more visible and manageable. You can query K8s to get the sensor state.

arm4b avatar Feb 15 '23 13:02 arm4b

Looking at the error messages, it appears the error is not being generated in the sensor code but a separate library/package/file - "stackrlclient/keyvalue.py" which is creating an ST2 client instance. Is this package re-using the AUTH token generated by the sensor at startup? Can this code be updated to leverage an APIKEY instead?

jamesdreid avatar Feb 27 '23 13:02 jamesdreid

@mamercad For those sensors that are failing to get the key, are they using the set_value and get_value methods in the sensor_service to access the dataservice, or accessing in another way? My polling sensors get their tokens refreshed, but I'm accessing via the sensor_service, which in turn goes into the get_api_key which does the refresh for you. I'm just wondering why the refresh isn't working for others.

The stack trace in the example didn't seem to use the get_value/set_value methods of the sensor service, so wouldn't get the refresh code that is available.

amanda11 avatar Feb 27 '23 15:02 amanda11

@mamercad For those sensors that are failing to get the key, are they using the set_value and get_value methods in the sensor_service to access the dataservice, or accessing in another way? My polling sensors get their tokens refreshed, but I'm accessing via the sensor_service, which in turn goes into the get_api_key which does the refresh for you. I'm just wondering why the refresh isn't working for others.

The stack trace in the example didn't seem to use the get_value/set_value methods of the sensor service, so wouldn't get the refresh code that is available.

Our Polling Sensors are currently using sensor_service.get_value.

mamercad avatar Mar 01 '23 12:03 mamercad

We are running stackstorm version 3.8 and just ran into this issue as well. We updated our sensors to use self._sensor_service.get_value instead of st2_client.keys.get_by_name as @amanda11 suggested and we are no longer getting auth errors.

jschoewe avatar Apr 10 '23 16:04 jschoewe