session-pysogs icon indicating copy to clipboard operation
session-pysogs copied to clipboard

Profanity list: If string is too long, sogs fails to start properly

Open slrslr opened this issue 2 years ago • 4 comments

I suspect that the profanity filtering does not support longer phrases like: privacy?public_key=118df8c6c471ac0468c7c77e1cdc12f24a139ee8a07c6e3bf4e7855640dad821 or aaaaaaaaaaadaaaaaaa118df8c6c471ac0468c7c77e1cdc12f24a139ee8a07c6e3bf4e7855640dad821

When i add it and ran "sudo systemctl restart sogs"

Job for sogs-proxied.service failed. See "systemctl status sogs-proxied.service" and "journalctl -xe" for details.

$ sogs --version PySOGS 0.3.5

I am on Linux Debian 11, .deb package

Can you reproduce it and fix it please?

slrslr avatar Dec 13 '22 13:12 slrslr

Same or similar issue regarding length of the profanity phrases is that when you have some longer phrases like this profanity blocklist, then the sogs commands like "sogs -Lv" takes too much time to execute, causing OOM kill: Out of memory: Killed process 24843 (uwsgi) total-vm:1100772kB, anon-rss:286844kB, file-rss:0kB, shmem-rss:180kB, UID:1000 pgtables:2152kB oom_score_adj:0 yet when you empty the profanity blocklist or just trim all lines to maximum 6 characters (cut -c -6 inputfile outputfile;mv inputfile yourprofanityfile) then it is very quick to execute without OOMkill.

When i sort my profanity, issue causing file by the length of lines: awk '{print length, $0}' /var/lib/session-open-group-server/profanity-block-list.txt | sort -n | cut -d " " -f2- Longest lines are:

to have my limits pushed
send me something corrupt
New group, dm with example
DM me with sample for group
chat with me about anything

After shortening these, i think that the sogs restart time decreased ~2 seconds and no OOMkill on "sogs -Lv" - even it is still very slow (delays like 8 seconds before output - likely due to profanity being somehow loaded maybe uselesly, 8 s. is the time equal to sogs restart time btw.) @jagerman @mdPlusPlus @majestrate

Debian GNU/Linux 11 (bullseye), 5.10.0-18-amd64, PySOGS 0.3.7

slrslr avatar Jun 11 '23 05:06 slrslr

Some of the following phrases also cause issues in sogs profanity blocklist, causing +30-40 seconds increase of the sogs restart time: willing to do tributes - issue willing to do tributess - issue willing to do tribuees - issue willing to do tribites - issue willing to do tributey - ok (no increase) willing to do tributet - ok (no increase) willing to do tributeyy - ok (no increase) aaaaaaa aa aa aaaaates - issue aaaaaaa aaaa aaaaaaas - Job for sogs-proxied.service failed. (after like 30 seconds) aaaaaaaaaaaaaaaates - Job for sogs-proxied.service failed. (after like 30 seconds) aaaaaaaaaaaaa aaa tes - Job for sogs-proxied.service failed. (after like 30 seconds) looking for other groups - ok (no increase) from Central Maine - issue

maybe not each charter has same size and there is some maximum threshold per phrase where it start causing issue?

slrslr avatar Jul 06 '23 11:07 slrslr

On Thursday, 6 July 2023 07:21:16 EDT slrslr wrote:

Some of the following phrases also cause issues in sogs profanity blocklist, causing +30-40 seconds increase of the sogs restart time: willing to do tributes - issue willing to do tributess - issue willing to do tribuees - issue willing to do tribites - issue willing to do tributey - ok (no increase) willing to do tributet - ok (no increase) willing to do tributeyy - ok (no increase) aaaaaaa aa aa aaaaates - issue aaaaaaa aaaa aaaaaaas - Job for sogs-proxied.service failed. (after like 30 seconds) aaaaaaaaaaaaaaaates - Job for sogs-proxied.service failed. (after like 30 seconds) aaaaaaaaaaaaa aaa tes - Job for sogs-proxied.service failed. (after like 30 seconds)

maybe not each charter has same size and there is some maximum threshold per phrase where it start causing issue?

i think that the way the filter is implemented is really naive and results in quadratic complexity given the number and size of each phrase.

this is something that likely could be implemented with a bloom filter or probabilistic negative lookup filter. funny enough there is a very fast one that is almost perfect for this:

https://github.com/NationalSecurityAgency/XORSATFilter

-- ~jeff

majestrate avatar Jul 06 '23 12:07 majestrate

This is bad problem. 1200 blocked phrases and the restart time is 1 minute 25 seconds and the memory usage is like 1 GB and 500MB swap. Which is near full capacity of my server. I wish you developers fix how sogs handle blocked words/blocklist. I am stopping being able to add new phrases since sogs would not start at all.

slrslr avatar Nov 08 '23 04:11 slrslr