fluent-logger-ruby icon indicating copy to clipboard operation
fluent-logger-ruby copied to clipboard

encoding issue UnicodeDecodeError

Open eredi93 opened this issue 4 years ago • 2 comments

Something within the logging pipeline is breaking encoding, but just for some characters. I'm having a hard time reproducing this issue and i cannot pin point what is actually causing this but it seems that on the fluentd level, either the logger or fluentd itself.

I deployed fluentd in production and sending events from the Rails app using this logger. The logger is configured to send events to fluentd which sends it to S3 in as a gzip file. I then have a processing pipeline using these files and here is where i started seeing the issues.

client config

client = Fluent::Logger::FluentLogger.new(
  nil,
  host: "localhost",
  port: 24224,
  use_nonblock: true,
  wait_writeable: false
)
client.post("foo", event)

fluentd config

<match foo.**>
  @type s3
  @id   S3_output

  s3_bucket my-bucket
  s3_region us-east-1

  acl bucket-owner-full-control
  store_as gzip_command

  path preprocessed_logs/year=%Y/month=%-m/day=%-d/hour=%-H
  s3_object_key_format "%{path}/#{Socket.gethostname}_%{hex_random}_%{index}.%{file_extension}"

  <buffer time>
    timekey 300
    timekey_use_utc true
    timekey_wait 30
    @type file
    path /var/log/td-agent/buffer/foo
  </buffer>

  <format>
    @type json
  </format>
</match>

It seems that some characters are badly encoded. here is this user agent example:

'Mozilla/5.0 (iPhone; CPU iPhone OS 13_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Clube da Fluência'

was logged as:

'Mozilla/5.0 (iPhone; CPU iPhone OS 13_4_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Clube da Flu\xeancia'

ê got changed to \xea which is breaking decoding.

Do you think this might be something to do with how the logger is sending data to fluentd?

to add more context, I'm using this logger in a Rails app and what I log is requests informations. I have checked the Rails side of things and the string passed to the logger is UTF-8 encoded.

eredi93 avatar Jun 11 '20 21:06 eredi93

Fluentd treats data as a binary by default. If you hit the encoding problem, one way is convert encoding by using record_modifier or something.

https://docs.fluentd.org/quickstart/faq#i-got-encoding-error-inside-plugin-how-to-fix-it

repeatedly avatar Jun 18 '20 04:06 repeatedly

@repeatedly thanks let me try this next week

eredi93 avatar Jul 07 '20 20:07 eredi93