semantic_logger icon indicating copy to clipboard operation
semantic_logger copied to clipboard

UTF-8 encoding is not enforced on all messages

Open jasonwbarnett opened this issue 4 years ago • 3 comments

Environment

Ruby v2.7.3 semantic_logger v4.7.4

Expected Behavior

I would expect that all messages logged would have UTF-8 encoding forced.

Actual Behavior

E [9051:SemanticLogger::Appenders] SemanticLogger::Appenders -- Failed to log to appender: SemanticLogger::Appender::SplunkHttp -- Exception: Encoding::UndefinedConversionError: "\xE2" from ASCII-8BIT to UTF-8
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/splunk_http.rb:102:in `to_json'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/splunk_http.rb:102:in `call'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/http.rb:165:in `log'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appenders.rb:20:in `block in log'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appenders.rb:18:in `each'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appenders.rb:18:in `log'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/async.rb:152:in `process_messages'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/async.rb:121:in `process'
/opt/ruby/embedded/lib/ruby/gems/2.7.0/gems/semantic_logger-4.7.4/lib/semantic_logger/appender/async.rb:77:in `block in thread'

jasonwbarnett avatar Nov 11 '21 21:11 jasonwbarnett

Sounds like an issue with all JSON rendering, since the data is assumed to be UTF-8 compatible.

To fix the issue properly, we should fix all formatters that output to JSON to convert all strings to valid utf-8 prior to calling .to_json. It would have been nice if .to_json directly supported code conversion options to handle this condition.

The log attributes most likely at risk of trying to log non utf-8 data:

  • message
  • tags
  • named_tags
  • payload
  • exception (all messages need cleansing)

Alternatively a recursive helper function could be written to navigate the entire hash structure fixing/stripping all non-utf-8 characters, before calling .to_json on it.

reidmorrison avatar Dec 21 '21 16:12 reidmorrison

@jasonwbarnett I would be interested to know your experience with the SplunkHttp appender. We tried it a few years ago and found that our application became dependent on the availability of the Splunk HTTP servers.

For example, when the Splunk http servers were down for any longer than a few minutes, our Rails apps would run out of memory trying to hold all the logs in memory, waiting for the Splunk http servers to recover.

We have instead move to an asynchronous model where we write the logs to an EBS volume, where a Splunk listener picks ups the logs at its leisure. This decouples Splunk as a dependency to run our mission critical Rails app.

Ideally we want to log to Kafka and have Splunk read from that instead. Would be interesting to know if Splunk have made any progress on Kafka support in the last few years.

Of course none of this fixes the above non UTF-8 issue above, that we definitely need to address.

reidmorrison avatar Dec 21 '21 16:12 reidmorrison

@jasonwbarnett I would be interested to know your experience with the SplunkHttp appender. We tried it a few years ago and found that our application became dependent on the availability of the Splunk HTTP servers.

For example, when the Splunk http servers were down for any longer than a few minutes, our Rails apps would run out of memory trying to hold all the logs in memory, waiting for the Splunk http servers to recover.

@reidmorrison The company I work for runs an extremely large Splunk infrastructure that is incredibly robust and reliable. So much so that when we asked Splunk about their SaaS offering they said we're running better infrastructure than they are and couldn't meet our needs. All that to say, I don't believe that Splunk HEC reliability has ever been a problem for us.

jasonwbarnett avatar Dec 21 '21 21:12 jasonwbarnett