st2 Refactor FileWatchSensor to remove logshipper

This PR refactors the FileWatchSensor from the linux pack, to remove it's dependency on logshipper, and use watchdog instead.

Upstream logshipper hasn't been touched in years, so we switched to our own fork. That hasn't been well maintained either, and the Tail implementation, which is all we use from it, is a bit of a mess on top of depending on pyinotify has been itself unmaintained for even longer.

This PR borrows some code from the previous sensor, but also breaks up the code into well structured classes that all have a single focus, possibly allowing this code to be reused elsewhere. Additionally, you can run the sensor Python script itself, which should help serve as an example of how to write a "good" sensor.

The watchdog project is now being actively maintained, has wide support for Linux, macOS, BSD, Windows, and a polling fallback implementation. The pyinotify package refuses to even install on macOS (not being able to run is understandable, but preventing installation is too limiting IMO).

I realize that it's a bit odd to make the linux depend less on Linux-specific features, but since pyinotify refused to even install on macOS, I couldn't do simple things like run make requirements without Linux running in a full Docker or VM environment.

This PR also fixes a bug in logshipper (has to do with lines that do not end in newlines - previous code only took the last character, new code takes the whole line), and also presents a workaround for another possible bug in the comments.

Currently being held up by #5095.

Nov 29 '20 10:11 blag

Overall, I'm fine with the change, but some tests would be good :)

Dec 16 '20 12:12 Kami

Just running these tests gives us excellent coverage:

................................................
Name                   Stmts   Miss Branch BrPart  Cover
--------------------------------------------------------
file_watch_sensor.py     221      8     68     11    93%
----------------------------------------------------------------------
Ran 48 tests in 14.800s

OK

Almost all of the untested code was in the sensor itself.

Dec 22 '20 19:12 blag

I tested this manually, both on macOS bare metal and in a Docker container (running on macOS).

contrib/linux/Dockerfile:

FROM ubuntu:18.04

RUN apt update && apt install --yes git make python3-dev python3-pip

RUN python3 -m pip install coverage eventlet mock nose watchdog

VOLUME /code

WORKDIR /code
ENV PYTHONPATH=actions:sensors
ENV NOSE_COVER_CONFIG_FILE=.coveragerc
CMD nosetests --with-coverage --cover-branches --cover-html tests/test_file_watch_sensor.py

And the coverage configuration contrib/linux/.coveragerc:

[run]
include = sensors/file_watch_sensor.py

[report]
include = sensors/file_watch_sensor.py

Commands (on bare metal):

cd contrib/linux
docker build --tag contrib-linux-pack-test .
docker run --rm --volume $(pwd):/code --workdir /code contrib-linux-pack-test

If you want to debug anything, run:

docker run -it --rm --volume $(pwd):/code --workdir /code contrib-linux-pack-test /bin/bash

Commands (within running Docker container):

# run a single test (specified at the bottom of test_file_watch_sensor.py)
python3 tests/test_file_watch_sensor.py

# run all tests and report coverage results in HTML
nosetests --with-coverage --cover-branches --cover-html tests/test_file_watch_sensor.py

then you can view results in your web browser.

I'm having an issue with the Travis tests where inotify events don't seem to be delivered. I'll have to debug that. But these testing instructions should suffice for anybody else who is interested in testing this out.

Dec 25 '20 02:12 blag

I have added a ton of tests for each piece of this PR, and I have test coverage up to 96%. I have run the tests in a Docker container on macOS bare metal, and in an st2vagrant VM, and all tests pass perfectly.

However, the tests do not run on Travis. It's as if inotify on Travis just does not work. I haven't put a lot of time into debugging that, since we are attempting of transitioning off of Travis entirely.

In addition to that, this code just does not actually work when run by StackStorm, and I can't figure out why. Events from inotify get put into the watchdog event queue, but they are not read out of the event queue and processed. I suspect that it has to do with how sensors are launched and run(), and I think it's a nasty interaction between eventlet cooperative threads and watchdog's use of Python's native preemptive threading.Threads.

@Kami Do you have any advice or help to offer here? I'm at a 100% standstill on this until I can get the sensor working.

Here is the log from running it (I patched StackStorm to log a bit more debug information):

vagrant@st2vagrant:~$ sudo -u st2 /opt/stackstorm/st2/bin/python /opt/stackstorm/st2/bin/st2sensorcontainer --config-file /etc/st2/st2.conf --sensor-ref=linux.FileWatchSensor
---- IS USING DEBUGGER: False
---- PATCH_THREAD: True
eventlet patched threading (by string): False
eventlet patched time (by string): True
2021-01-20 20:42:02,847 DEBUG [-] Using Python: 3.6.9 (/opt/stackstorm/st2/bin/python)
2021-01-20 20:42:02,847 DEBUG [-] Using config files: /etc/st2/st2.conf
2021-01-20 20:42:02,848 DEBUG [-] Using logging config: /etc/st2/logging.sensorcontainer.conf
2021-01-20 20:42:02,860 INFO [-] Connecting to database "st2" @ "127.0.0.1:27017" as user "stackstorm".
2021-01-20 20:42:02,866 INFO [-] Successfully connected to database "st2" @ "127.0.0.1:27017" as user "stackstorm".
2021-01-20 20:42:02,866 DEBUG [-] Ensuring database indexes...
2021-01-20 20:42:03,014 DEBUG [-] Skipping index cleanup for blacklisted model "PermissionGrantDB"...
2021-01-20 20:42:03,082 DEBUG [-] Indexes are ensured for models: ActionAliasDB, ActionAliasDB, ActionDB, ActionExecutionDB, ActionExecutionDB, ActionExecutionOutputDB, ActionExecutionSchedulingQueueItemDB, ActionExecutionStateDB, ActionExecutionStateDB, ApiKeyDB, ConfigDB, ConfigSchemaDB, GroupToRoleMappingDB, KeyValuePairDB, LiveActionDB, LiveActionDB, PackDB, PermissionGrantDB, PolicyDB, PolicyTypeDB, RoleDB, RuleDB, RuleEnforcementDB, RunnerTypeDB, RunnerTypeDB, SensorTypeDB, TaskExecutionDB, TokenDB, TraceDB, TriggerDB, TriggerInstanceDB, TriggerTypeDB, UserDB, UserRoleAssignmentDB, WorkflowExecutionDB
2021-01-20 20:42:03,083 DEBUG [-] Registering exchanges...
2021-01-20 20:42:03,084 DEBUG [-] Using SSL context for RabbitMQ connection: {}
2021-01-20 20:42:03,093 DEBUG [-] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@st2vagrant', 'copyright': 'Copyright (C) 2007-2017 Pivotal Software, Inc.', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP', 'product': 'RabbitMQ', 'version': '3.6.10'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
2021-01-20 20:42:03,095 DEBUG [-] using channel_id: 1
2021-01-20 20:42:03,096 DEBUG [-] Channel open
2021-01-20 20:42:03,096 DEBUG [-] Registered exchange st2.actionexecutionstate ({'exchange': 'st2.actionexecutionstate', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,097 DEBUG [-] Registered exchange st2.announcement ({'exchange': 'st2.announcement', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,098 DEBUG [-] Registered exchange st2.execution ({'exchange': 'st2.execution', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,098 DEBUG [-] Registered exchange st2.liveaction ({'exchange': 'st2.liveaction', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,099 DEBUG [-] Registered exchange st2.liveaction.status ({'exchange': 'st2.liveaction.status', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,100 DEBUG [-] Registered exchange st2.trigger ({'exchange': 'st2.trigger', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,101 DEBUG [-] Registered exchange st2.trigger_instances_dispatch ({'exchange': 'st2.trigger_instances_dispatch', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,101 DEBUG [-] Registered exchange st2.sensor ({'exchange': 'st2.sensor', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,102 DEBUG [-] Registered exchange st2.workflow ({'exchange': 'st2.workflow', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,102 DEBUG [-] Registered exchange st2.workflow.status ({'exchange': 'st2.workflow.status', 'type': 'topic', 'durable': True, 'auto_delete': False, 'arguments': None, 'nowait': False, 'passive': False}).
2021-01-20 20:42:03,103 DEBUG [-] Closed channel #1
2021-01-20 20:42:03,103 DEBUG [-] using channel_id: 1
2021-01-20 20:42:03,104 DEBUG [-] Channel open
2021-01-20 20:42:03,104 DEBUG [-] Predeclaring queue for exchange "st2.liveaction.status"
2021-01-20 20:42:03,106 DEBUG [-] Predeclared queue for exchange "st2.liveaction.status"
2021-01-20 20:42:03,106 DEBUG [-] Predeclaring queue for exchange "st2.liveaction.status"
2021-01-20 20:42:03,107 DEBUG [-] Predeclared queue for exchange "st2.liveaction.status"
2021-01-20 20:42:03,108 DEBUG [-] Predeclaring queue for exchange "st2.liveaction.status"
2021-01-20 20:42:03,109 DEBUG [-] Predeclared queue for exchange "st2.liveaction.status"
2021-01-20 20:42:03,109 DEBUG [-] Predeclaring queue for exchange "st2.execution"
2021-01-20 20:42:03,111 DEBUG [-] Predeclared queue for exchange "st2.execution"
2021-01-20 20:42:03,111 DEBUG [-] Predeclaring queue for exchange "st2.actionexecutionstate"
2021-01-20 20:42:03,113 DEBUG [-] Predeclared queue for exchange "st2.actionexecutionstate"
2021-01-20 20:42:03,113 DEBUG [-] Predeclaring queue for exchange "st2.trigger_instances_dispatch"
2021-01-20 20:42:03,115 DEBUG [-] Predeclared queue for exchange "st2.trigger_instances_dispatch"
2021-01-20 20:42:03,115 DEBUG [-] Predeclaring queue for exchange "st2.announcement"
2021-01-20 20:42:03,118 DEBUG [-] Predeclared queue for exchange "st2.announcement"
2021-01-20 20:42:03,118 DEBUG [-] Predeclaring queue for exchange "st2.execution"
2021-01-20 20:42:03,120 DEBUG [-] Predeclared queue for exchange "st2.execution"
2021-01-20 20:42:03,120 DEBUG [-] Predeclaring queue for exchange "st2.liveaction"
2021-01-20 20:42:03,123 DEBUG [-] Predeclared queue for exchange "st2.liveaction"
2021-01-20 20:42:03,123 DEBUG [-] Predeclaring queue for exchange "st2.execution.output"
2021-01-20 20:42:03,129 DEBUG [-] Predeclared queue for exchange "st2.execution.output"
2021-01-20 20:42:03,129 DEBUG [-] Predeclaring queue for exchange "st2.workflow.status"
2021-01-20 20:42:03,132 DEBUG [-] Predeclared queue for exchange "st2.workflow.status"
2021-01-20 20:42:03,133 DEBUG [-] Predeclaring queue for exchange "st2.workflow.status"
2021-01-20 20:42:03,135 DEBUG [-] Predeclared queue for exchange "st2.workflow.status"
2021-01-20 20:42:03,136 DEBUG [-] Predeclaring queue for exchange "st2.trigger"
2021-01-20 20:42:03,138 DEBUG [-] Predeclared queue for exchange "st2.trigger"
2021-01-20 20:42:03,139 DEBUG [-] Predeclaring queue for exchange "st2.sensor"
2021-01-20 20:42:03,141 DEBUG [-] Predeclared queue for exchange "st2.sensor"
2021-01-20 20:42:03,142 DEBUG [-] Closed channel #1
2021-01-20 20:42:03,152 DEBUG [-] Inserting system roles (['system_admin', 'admin', 'observer'])
2021-01-20 20:42:03,173 DEBUG [-] found extension EntryPoint.parse('echo = st2common.metrics.drivers.echo_driver:EchoDriver')
2021-01-20 20:42:03,173 DEBUG [-] found extension EntryPoint.parse('noop = st2common.metrics.drivers.noop_driver:NoopDriver')
2021-01-20 20:42:03,174 DEBUG [-] found extension EntryPoint.parse('statsd = st2common.metrics.drivers.statsd_driver:StatsdDriver')
2021-01-20 20:42:03,175 INFO [-] Running in single sensor mode, using a single sensor partitioner...
2021-01-20 20:42:03,177 INFO [-] Setting up container to run 1 sensors.
2021-01-20 20:42:03,178 INFO [-] 	Sensors list - ['linux.FileWatchSensor'].
2021-01-20 20:42:03,178 INFO [-] (PID:19674) SensorContainer started.
2021-01-20 20:42:03,178 DEBUG [-] Using SSL context for RabbitMQ connection: {}
SPAWNING eventlet thread
2021-01-20 20:42:03,179 DEBUG [-] Starting sensor CUD watcher...
2021-01-20 20:42:03,179 DEBUG [-] Using SSL context for RabbitMQ connection: {}
SPAWNING eventlet thread
2021-01-20 20:42:03,179 INFO [-] Running sensor linux.FileWatchSensor
2021-01-20 20:42:03,180 DEBUG [-] Creating temporary auth token for sensor FileWatchSensor
2021-01-20 20:42:03,188 DEBUG [-] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@st2vagrant', 'copyright': 'Copyright (C) 2007-2017 Pivotal Software, Inc.', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP', 'product': 'RabbitMQ', 'version': '3.6.10'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
2021-01-20 20:42:03,189 INFO [-] Connected to amqp://guest:**@127.0.0.1:5672//
2021-01-20 20:42:03,189 DEBUG [-] using channel_id: 1
2021-01-20 20:42:03,190 DEBUG [-] Channel open
2021-01-20 20:42:03,200 AUDIT [-] Access granted to "sensors_container" with the token set to expire at "2021-01-21T20:42:03.195913Z". (username='sensors_container',token_expiration='2021-01-21T20:42:03.195913Z')
2021-01-20 20:42:03,201 WARNING [-] "auth.api_url" configuration option is not configured
2021-01-20 20:42:03,201 DEBUG [-] Running sensor subprocess (cmd="/opt/stackstorm/st2/bin/python /opt/stackstorm/st2/lib/python3.6/site-packages/st2reactor/container/sensor_wrapper.py --pack=linux --file-path=/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py --class-name=FileWatchSensor --trigger-type-refs=linux.file_watch.line --parent-args=["--config-file", "/etc/st2/st2.conf", "--sensor-ref=linux.FileWatchSensor"]")
2021-01-20 20:42:03,206 DEBUG [-] Dispatching trigger (trigger=core.st2.sensor.process_spawn,payload={'trigger': 'core.st2.sensor.process_spawn', 'payload': {'id': 'FileWatchSensor', 'timestamp': 1611175323, 'pid': 19680, 'cmd': '/opt/stackstorm/st2/bin/python /opt/stackstorm/st2/lib/python3.6/site-packages/st2reactor/container/sensor_wrapper.py --pack=linux --file-path=/opt/stackstorm/packs/linux/sensors/file_watch_sensor.py --class-name=FileWatchSensor --trigger-type-refs=linux.file_watch.line --parent-args=["--config-file", "/etc/st2/st2.conf", "--sensor-ref=linux.FileWatchSensor"]'}, 'trace_context': None})
2021-01-20 20:42:03,216 DEBUG [-] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@st2vagrant', 'copyright': 'Copyright (C) 2007-2017 Pivotal Software, Inc.', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP', 'product': 'RabbitMQ', 'version': '3.6.10'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
2021-01-20 20:42:03,218 DEBUG [-] using channel_id: 1
2021-01-20 20:42:03,219 DEBUG [-] Channel open
2021-01-20 20:42:03,230 DEBUG [-] Closed channel #1
2021-01-20 20:42:03,230 INFO [-] Sensor linux.FileWatchSensor started
---- IS USING DEBUGGER: False
---- PATCH_THREAD: True
eventlet patched threading (by string): False
eventlet patched time (by string): True
2021-01-20 20:42:04,183 DEBUG [-] Using SSL context for RabbitMQ connection: {}
2021-01-20 20:42:04,192 INFO [-] No config found for sensor "FileWatchSensor"
2021-01-20 20:42:04,193 DEBUG [-] Using SSL context for RabbitMQ connection: {}
SPAWNING eventlet thread
SPAWNING eventlet thread
2021-01-20 20:42:04,193 INFO [-] Watcher started
2021-01-20 20:42:04,194 INFO [-] Running sensor initialization code
2021-01-20 20:42:04,194 INFO [-] Running sensor in passive mode
2021-01-20 20:42:04,195 DEBUG [-] Running TailManager
2021-01-20 20:42:04,203 DEBUG [-] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@st2vagrant', 'copyright': 'Copyright (C) 2007-2017 Pivotal Software, Inc.', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP', 'product': 'RabbitMQ', 'version': '3.6.10'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
2021-01-20 20:42:04,205 INFO [-] Connected to amqp://guest:**@127.0.0.1:5672//
2021-01-20 20:42:04,205 DEBUG [-] using channel_id: 1
2021-01-20 20:42:04,207 DEBUG [-] Channel open
2021-01-20 20:42:04,209 DEBUG [-] Found existing trigger: TriggerDB(description=None, id=5ffddd7860943ff22b87523f, metadata_file=None, name="1e11e041-4d13-4bdf-8d79-95fdf43b6f29", pack="linux", parameters={'file_path': '/home/vagrant/test.log', 'follow': True}, ref="linux.1e11e041-4d13-4bdf-8d79-95fdf43b6f29", ref_count=1, type="linux.file_watch.line", uid="trigger:linux:1e11e041-4d13-4bdf-8d79-95fdf43b6f29:3eb0e4c361ea826f7456ee08b883b15c") in db.
2021-01-20 20:42:04,210 DEBUG [-] Calling sensor "add_trigger" method (trigger.type=linux.file_watch.line)
2021-01-20 20:42:04,211 DEBUG [-] Tailing single file: /home/vagrant/test.log
2021-01-20 20:42:04,211 DEBUG [-] Setting path to /home/vagrant/test.log
2021-01-20 20:42:04,212 DEBUG [-] Opening file '/home/vagrant/test.log'
2021-01-20 20:42:04,212 DEBUG [-] Seeking to end
2021-01-20 20:42:04,213 DEBUG [-] Scheduling watch on file: '/home/vagrant/test.log'
2021-01-20 20:42:04,213 DEBUG [-] Scheduled watch on file: '/home/vagrant/test.log'
2021-01-20 20:42:04,214 DEBUG [-] Parent watch: None
2021-01-20 20:42:04,214 DEBUG [-] Scheduling watch on parent directory: '/home/vagrant'
2021-01-20 20:42:04,215 DEBUG [-] Scheduled watch on parent directory: '/home/vagrant'
2021-01-20 20:42:04,215 INFO [-] Added file "/home/vagrant/test.log"
2021-01-20 20:42:04,216 DEBUG [-] Starting TailManager
2021-01-20 20:42:04,216 DEBUG [-] Starting BaseObserver
2021-01-20 20:42:04,217 DEBUG [-] Starting emitter: <InotifyEmitter(Thread-2, initial daemon)>
2021-01-20 20:42:04,218 DEBUG [-] Starting BaseThread
2021-01-20 20:42:04,218 DEBUG [-] Starting BaseThread
inotify_events: [<InotifyEvent: src_path=b'/home/vagrant/test.log', wd=1, mask=IN_MODIFY, cookie=0, name=>]
2021-01-20 20:42:08,884 DEBUG [-] in-event <InotifyEvent: src_path=b'/home/vagrant/test.log', wd=1, mask=IN_MODIFY, cookie=0, name=>
2021-01-20 20:42:08,899 DEBUG [-] grouped: [<InotifyEvent: src_path=b'/home/vagrant/test.log', wd=1, mask=IN_MODIFY, cookie=0, name=>]
2021-01-20 20:42:08,899 DEBUG [-] Putting event on queue: <InotifyEvent: src_path=b'/home/vagrant/test.log', wd=1, mask=IN_MODIFY, cookie=0, name=>
^C2021-01-20 20:42:13,260 INFO [-] Container shutting down. Invoking cleanup on sensors.
2021-01-20 20:42:14,263 INFO [-] All sensors are shut down.
2021-01-20 20:42:14,263 DEBUG [-] Shutting down sensor watcher.
2021-01-20 20:42:14,273 DEBUG [-] Closed channel #1
2021-01-20 20:42:14,274 DEBUG [-] Start from server, version: 0.9, properties: {'capabilities': {'publisher_confirms': True, 'exchange_exchange_bindings': True, 'basic.nack': True, 'consumer_cancel_notify': True, 'connection.blocked': True, 'consumer_priorities': True, 'authentication_failure_close': True, 'per_consumer_qos': True, 'direct_reply_to': True}, 'cluster_name': 'rabbit@st2vagrant', 'copyright': 'Copyright (C) 2007-2017 Pivotal Software, Inc.', 'information': 'Licensed under the MPL.  See http://www.rabbitmq.com/', 'platform': 'Erlang/OTP', 'product': 'RabbitMQ', 'version': '3.6.10'}, mechanisms: [b'PLAIN', b'AMQPLAIN'], locales: ['en_US']
2021-01-20 20:42:14,276 DEBUG [-] using channel_id: 1
2021-01-20 20:42:14,278 DEBUG [-] Channel open
2021-01-20 20:42:14,282 DEBUG [-] Closed channel #1
2021-01-20 20:42:14,282 INFO [-] (PID:19674) SensorContainer stopped. Reason - KeyboardInterrupt

Jan 20 '21 20:01 blag

Will rebase on master once #5098 is merged to see if inotify works properly on GitHub Actions.

But the test_file_watch_sensor.py tests still need to accurately reflect a production sensor environment and they clearly do not (yet).

Jan 21 '21 19:01 blag

I need to circle back around to this to update the code with black and see if the tests run on GHA.

But also, I want to test this code on a live system to make sure it actually works. If this code passes all of our tests but doesn't actually work in production, then both the code and the tests for it are 100% worthless.

Apr 07 '21 23:04 blag

The compile step is failing over code I did not touch.

I don't expect to have time to debug this in the foreseeable future.

Apr 12 '21 20:04 blag

Possibly blocked by #5257.

Aug 11 '21 20:08 blag

All committers have signed the CLA.

Sep 06 '21 14:09 CLAassistant

The requirements compile issues have been resolved. The pylint error about importing pika in a tools/ script has been resolved.

Now there's one consistent error left in the unit tests, though I don't know why this PR would cause that:

======================================================================
1) FAIL: test_process_error_handling (tests.unit.test_workflow_engine.WorkflowExecutionHandlerTest)
----------------------------------------------------------------------
   Traceback (most recent call last):
    virtualenv/lib/python3.6/site-packages/mock/mock.py line 1346 in patched
      return func(*newargs, **newkeywargs)
    st2actions/tests/unit/test_workflow_engine.py line 203 in test_process_error_handling
      self.assertEqual(t1_ex_db.status, wf_statuses.FAILED)
   AssertionError: 'succeeded' != 'failed'
   - succeeded
   + failed

Oct 06 '21 20:10 cognifloyd

Well. I can't reproduce that test failure in CI. It might be related to #5358 which was just merged and touched the test that was failing. But, it's not failing anymore. If it shows up again, we can look into it.

Anyway, all tests are green now.

Oct 07 '21 00:10 cognifloyd

I'm currently testing this.

:white_check_mark: The first issue is that the Readme is not correct (and wasn't correct for the previous sensor either). You do NOT add a list of files to watch in the pack config. Instead, you must create a rule that uses the linux.file_watch.line trigger. You define the file_path trigger parameter in that rule.
Next issue: Using the original sensor, I've tried several variations on the rule, but I don't see any trigger instances or rule enforcements. So, the status quo does not work for me. Interestingly, the sensorcontainer logs do show that it adds and watches the correct file.

Oct 15 '21 03:10 cognifloyd

:white_check_mark: Next issue: If a rule is updated (eg change file path being watched) then the sensor needs to handle the update and change the watched path. (testing the sensor in this PR)

Oct 15 '21 03:10 cognifloyd

OK. With debug logging (using the sensor from this PR), I can see that:

TailManager.run() logs its Running TailManager line
FileWatchSensor.add_trigger() gets called as expected due to a rule that uses the appropriate parametrized trigger
- tail_manager.tail_file() gets called
  - SingleFileTail inits correctly
    - opens the file
    - runs observer.schedule() to start the file and parent dir watches
- tail_manager.start() gets called logging its Starting TailManager line
  - the observer.start() method does not seem to be returning. I do not see the Started Observer, emitters log line.
    - observer.start() loops through its emitters, calling emitter.start(), but that is already called in observer.schedule() so that's probably not it
    - observer.start() also calls BaseThread.start() (via super(): approx mro BaseObserver(EventDispatcher(watchdog.utils.BaseThread)))
    - BaseThread(threading.Thread).start() calls self.on_thread_start()
      - This inits an InotifyBuffer which is itself a BaseThread.
        
        inotify_buffer is geting the file change notifications correctly because its log lines are showing up
    - BaseThread(threading.Thread).start() should finally call threading.Thread.start(self) but I can't tell if this gets called or if it completes
- so, add_trigger never returns - this is probably one reason updates don't happen.

observer is a watchdog.observers.Observer instance.

We might also need to modify:

TailManager.run() as it uses time.sleep ... not sure if we need to use eventlet's version

Oct 15 '21 04:10 cognifloyd

Since this doesn't work with eventlet yet, I marked it as draft.

Oct 15 '21 04:10 cognifloyd

Using eventlet.debug.spew() it looks like watchdog's observer is blocking on trying to read events with os.read(self._inotify_fd, event_buffer_size). If the thread patching had worked it would have switched contexts back to something else (another greenlet / thread).

https://github.com/gorakhargosh/watchdog/blob/v2.1.6/src/watchdog/observers/inotify_buffer.py#L88
https://github.com/gorakhargosh/watchdog/blob/v2.1.6/src/watchdog/observers/inotify_c.py#L285

When I modify the file, this forces it to continue on to the next line (because it was able to read the fd). Then it seems to get lost in the logger:

https://github.com/gorakhargosh/watchdog/blob/v2.1.6/src/watchdog/observers/inotify_buffer.py#L57

Subsequent file edits never spew anything else from watchdog. Instead there are some lines from eventlet.green.threading and a loop in eventlet.patcher and a couple lines from threading about a limbo lock... Something is not respecting eventlet. Not sure what.

Oct 15 '21 05:10 cognifloyd

History of PRs touching the sensor in this pack (my git spelunking notes):

#1134
#1966
#2695
#3293
#3361
#3398
#3475
#4491
copyright/license PR(s)
this one :)

Jan 30 '24 03:01 cognifloyd

...

tail_manager.start() gets called logging its Starting TailManager line

the observer.start() method does not seem to be returning. I do not see the Started Observer, emitters log line.

observer.start() loops through its emitters, calling emitter.start(), but that is already called in observer.schedule() so that's probably not it

observer.start() also calls BaseThread.start() (via super(): approx mro BaseObserver(EventDispatcher(watchdog.utils.BaseThread)))

BaseThread(threading.Thread).start() calls self.on_thread_start()

This inits an InotifyBuffer which is itself a BaseThread.

inotify_buffer is geting the file change notifications correctly because its log lines are showing up

BaseThread(threading.Thread).start() should finally call threading.Thread.start(self) but I can't tell if this gets called or if it completes

so, add_trigger never returns - this is probably one reason updates don't happen.

observer is a watchdog.observers.Observer instance.

Watchdog uses threading. But the threading module is not getting monkey patched. I tried adjusting the monkey patching, monkey_patch(patch_thread=True), but that made no difference.

Actually, it looks like threading is partially getting monkey patched, but adjusting that monkey patch call didn't change anything.

For reference, I used this to determine the state of monkey patching, just before the observer "thread" is supposed to start. In every scenario I tried, I got eventlet.green.threading and False, so it is monkey patched, but eventlet doesn't think it is. :shrug: So, this is going to have to wait until we can get rid of eventlet. :sob:

    def start(self):
        if self.tails and not self.started:
            self.logger.debug("Starting TailManager")
            import threading
            self.logger.debug(f"threading monkey_patch: {threading.current_thread.__module__}")
            import eventlet
            self.logger.debug(f"threading monkey_patch: {eventlet.patcher.is_monkey_patched(threading)}")

            self.observer.start()

We might also need to modify:

TailManager.run() as it uses time.sleep ... not sure if we need to use eventlet's version

I tested using eventlet.sleep, but that didn't make a difference because the context never goes back to that loop.

Jan 30 '24 22:01 cognifloyd

st2 st2 copied to clipboard

Refactor FileWatchSensor to remove logshipper

st2
st2 copied to clipboard