Watching a multiline string causes problems
[!CAUTION] Do not attempt to reproduce this on the production SmokeDetector instance. It will very likely break stuff and cause problems.
Summary
Attempting to use !!/watch-force to watch a multiline chat message breaks the watched_keywords.txt file by improperly inserting a newline.
Background information
There was a desire to modify a watch in CHQ, but the watch was longer than the limit for a 'normal' chat message. This isn't inherently an issue if it gets modified without the use of chat.
While multiline messages can get around that restriction, that also introduces a newline character (\n). From the regex perspective, that can be addressed (either by adding {0} directly after the newline or by starting a regex comment before the newline and ending it afterwards, as noted by @makyen).
However, it also breaks SmokeDetector as that inserts a newline directly into the watched_keywords.txt file, which breaks it. This was wisely foreseen by Makyen, hence not testing on production.
Details
- For reference, here is the string I tested with. The newline (created with Shift + Enter) has been replaced with
\nfor this GitHub issue.000000[trimmed for brevity]000000(?# foo\nbar) -
SD can receive multiline chat messages, but at least in
!!/bisect, it treats it as a\n. It just handles them problematically for watches (and presumably blacklists). - Attempting to watch a multiline string in chat, as I tried to here, breaks stuff. Specifically, it produced this as the end of
watched_keywords.txt.
And this console error:65274 1723875289 cocomac 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000(?# foo 65275 bar)[06:19:34.736] watched_keywords.txt:65275:not enough values to unpack (expected 3, got 1) File "/home/ubuntu/SmokeDetector/blacklists.py", line 105, in parse when, by_whom, what = line.rstrip().split('\t') ^^^^^^^^^^^^^^^^^^^ [06:19:40.636] Global blacklists loaded - Here is the relevant part of the console logs after attempting a bisect of some text that would match the regex.
[06:21:42.072] Command received: !!/bisect 00000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000000000000000000000000000<span>…</span> [06:21:50.007] Attempted to send status ping but metasmoke_host is undefined. Not sent. [06:22:10.883] regex._regex_core.error: missing ) at position 975 2024-08-17 06:22:10.878529 UTC File "/home/ubuntu/SmokeDetector/chatcommunicate.py", line 530, in __call__ result = self.__func__(*processed_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1691, in bisect results_bookended = bisect_regex_in_n_size_chunks(64, s, bookended_regexes, bookend=True, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1624, in bisect_regex_in_n_size_chunks results = bisect_regex(test_text, chunk, bookend=bookend, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1573, in bisect_regex compiled = regex_compile_no_cache(formatted_regex, city=findspam.city_list, ignore_unused=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/helpers.py", line 491, in regex_compile_no_cache return regex_raw_compile(regex_text, flags, ignore_unused, kwargs, False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/regex/regex.py", line 542, in _compile raise error(caught_exception.msg, caught_exception.pattern, File "/home/ubuntu/SmokeDetector/chatcommunicate.py", line 530, in __call__ result = self.__func__(*processed_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1691, in bisect results_bookended = bisect_regex_in_n_size_chunks(64, s, bookended_regexes, bookend=True, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1624, in bisect_regex_in_n_size_chunks results = bisect_regex(test_text, chunk, bookend=bookend, timeout=timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/chatcommands.py", line 1573, in bisect_regex compiled = regex_compile_no_cache(formatted_regex, city=findspam.city_list, ignore_unused=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/ubuntu/SmokeDetector/helpers.py", line 491, in regex_compile_no_cache return regex_raw_compile(regex_text, flags, ignore_unused, kwargs, False) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/regex/regex.py", line 542, in _compile raise error(caught_exception.msg, caught_exception.pattern,
What problem has occurred? What issues has it caused?
It broke. Details are above. As this was on a testing instance, it didn't impact the production instance.
On my testing instance, it broke the watched_keywords.txt file, which would need manual intervention to fix. When I tested locally, it failed to push the broken watched_keywords.txt file to GitHub, but I don't know if that's due to my testing setup or this bug. Various Git-related SD commands might be able to restore it to a functional state, although I did not test that, so I cannot confirm if that would be a remedy if this happened to a production SD instance.
What would you like to happen/not happen?
"like to happen"
- Reject a multiline chat watch with a helpful message. Something like Multiline chat watches are not supported. Consider using Git (or pinging someone that can) as an alternative. might work*
- Accept the multiline watch and correctly input it into the relevant file. This would also require deciding how to treat the newline (leave it as-is, remove the newline, remove everything after & including the first newline, etc.)
*I don't know what the preference is on the best way to request a manual Git-based change, so if that path is chosen, someone should probably confirm that wording.
"like to not happen"
It should not corrupt files.
First of all, thanks for testing this and taking the effort not to do so in production. I'm in favour of rejecting the multiline chat watch with a helpful message. We already have a system in place to deal with 'wrong' watch suggestions, adding another detection with message seems like the least complicated fix for this bug.
Consider long watches are not necessarily something we want a lot of on the watchlist. If lists of exclusions need to increase in length (which triggered this test) regularly, it must be understood at some point it may be better to split the watch into multiple watches that are more specific and don't require as many exclusions. Watches with many exclusions are too dangerous to upgrade to the blacklist if you don't know when a new exception will pop up.
For the few cases where this isn't possible, making the change through Git (or requesting the change to be made by someone else) seems like the appropriate solution. This also makes it easier to see how a watch has mutated over time, in my opinion.
If this is considered a complicated process, we can consider amending the wiki with instructions. Working with longer regex are somewhat of an advanced part of Charcoal. Getting people who are interested in that more involved on Git isn't a bad thing.