discourse icon indicating copy to clipboard operation
discourse copied to clipboard

FIX: Crawler requests exceptions for non UTF-8 user agents with invalid bytes

Open Arkshine opened this issue 1 year ago • 0 comments

Meta: https://meta.discourse.org/t/encoding-conversion-error-from-ascii-8bit-to-utf-8-in-logs/308603/2

Crawler requests for non-UTF-8 user agents that contain invalid bytes generate an exception at two places. See get_data() function:

  • On encode("utf-8") that results either in the following error depending on the incoming encoding
    • InvalidByteSequenceError
    • UndefinedConversionError
  • On matching user-agent with invalid byte results to ArgumentError. Called from helper.is_crawler and helper.is_mobile, part of the AnonymousCache::Helper class.

This PR does the following:

  • Handles encode() exceptions by relying on undef and invalid params to replace the faulty bytes instead of raising an exception. It moved into its own module.
  • Provides a safe user agent in AnonymousCache::Helper.

The anonymous_cache_spec.rb tests are specifically for the methods: blocked_crawler?, key_is_modern_mobile_device?, and key_is_old_browser?.

Hopefully, the implementation is okay.

Arkshine avatar May 24 '24 16:05 Arkshine