Handle decoding errors in file inspection during filtering
Sentry Issue: BOT-436
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf5 in position 14: invalid start byte
File "bot/exts/filtering/filtering.py", line 237, in on_message
await _extract_text_file_content(a)
File "bot/exts/filtering/filtering.py", line 73, in _extract_text_file_content
file_lines = file_content_bytes.decode(file_encoding).splitlines()
Unhandled exception in on_message.
It's not obvious to me what would be appropriate here. Block the file? Ignore unknown characters?
@swfarnsworth your input would be appreciated
@mbaruh I think this is the first time this has happened?
We could just add errors='ignore'.
In this case it seems like a user uploaded a zip file with a txt extension. I assume the same could happen if you tried to share an executable with a txt extension? If so it could be an indication of somebody trying to bypass filters, though seemlingly rare so not a high priority.
What was the value of file_encoding in this error?
What was the value of file_encoding in this error?
"utf-8"