discourse
discourse copied to clipboard
FIX: Crawler requests exceptions for non UTF-8 user agents with invalid bytes
Meta: https://meta.discourse.org/t/encoding-conversion-error-from-ascii-8bit-to-utf-8-in-logs/308603/2
Crawler requests for non-UTF-8 user agents that contain invalid bytes generate an exception at two places.
See get_data() function:
- On
encode("utf-8")that results either in the following error depending on the incoming encodingInvalidByteSequenceErrorUndefinedConversionError
- On matching user-agent with invalid byte results to
ArgumentError. Called fromhelper.is_crawlerandhelper.is_mobile, part of theAnonymousCache::Helperclass.
This PR does the following:
- Handles
encode()exceptions by relying onundefandinvalidparams to replace the faulty bytes instead of raising an exception. It moved into its own module. - Provides a safe user agent in
AnonymousCache::Helper.
The anonymous_cache_spec.rb tests are specifically for the methods: blocked_crawler?, key_is_modern_mobile_device?, and key_is_old_browser?.
Hopefully, the implementation is okay.