fluent-plugin-record-modifier icon indicating copy to clipboard operation
fluent-plugin-record-modifier copied to clipboard

incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)

Open himanshigpta opened this issue 4 years ago • 0 comments

Problem

I'm getting below error while shipping logs to ES via td-agent 1.11.1:

2020-11-01 17:11:42 +0530 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
  2020-11-01 17:11:42 +0530 [error]: #0 suppressed same stacktrace
2020-11-01 17:11:42 +0530 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/parser_regexp.rb:50:in `match'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/parser_regexp.rb:50:in `parse'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-grok-parser-2.6.1/lib/fluent/plugin/parser_multiline_grok.rb:21:in `block in parse'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-grok-parser-2.6.1/lib/fluent/plugin/parser_multiline_grok.rb:20:in `each'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluent-plugin-grok-parser-2.6.1/lib/fluent/plugin/parser_multiline_grok.rb:20:in `parse'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:546:in `block in parse_multilines'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:544:in `each'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:544:in `parse_multilines'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:469:in `call'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:469:in `receive_lines'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:845:in `block in handle_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:877:in `with_io'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:825:in `handle_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:808:in `block in on_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:808:in `synchronize'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:808:in `on_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:653:in `on_notify'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:325:in `block in setup_watcher'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin/in_tail.rb:596:in `on_timer'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run_once'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/cool.io-1.6.0/lib/cool.io/loop.rb:88:in `run'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin_helper/event_loop.rb:93:in `block in start'
  2020-11-01 17:11:42 +0530 [error]: #0 /opt/td-agent/lib/ruby/gems/2.7.0/gems/fluentd-1.11.1/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2020-11-01 17:11:43 +0530 [error]: #0 incompatible encoding regexp match (UTF-8 regexp with ASCII-8BIT string)
  2020-11-01 17:11:43 +0530 [error]: #0 suppressed same stacktrace

I've added the parameter suggested here :+1: https://github.com/repeatedly/fluent-plugin-record-modifier#char_encoding as it was recommended here https://docs.fluentd.org/quickstart/faq but the issue persists.

...

Steps to replicate

Provide example config and message

# encoding: utf-8
<source>
  @type tail
  path /var/log/messages
  pos_file /etc/td-agent/new_var_log_msg_grok.log.pos
  #time_format %Y-%m-%dT%H:%M:%S.%NZ
  time_format %b %dT%H:%M:%SZ
  tag var.msg
  <parse>
    @type multiline_grok
    <grok>
     pattern %{SYSLOGTIMESTAMP:time}%{SPACE}%{HOSTNAME:hostname}%{SPACE}%{GREEDYDATA:service_name}:%{GREEDYDATA:log_message}
    </grok>
  </parse>
</source>

<filter var.msg>
    @type record_modifier
     <record>
     hostname "#{Socket.gethostname}"
     formatted_time ${Time.at(time).iso8601(3)}
     char_encoding utf-8
     char_encoding utf-8:euc-jp
     </record>
</filter>

<match var.msg>
  @type elasticsearch
#  type_name "_doc"
  hosts redacted:9200
  scheme "https"
  ssl_version TLSv1_2
  ssl_verify false
  ca_file "/etc/td-agent/cert.crt"
  user redacted
  password redacted
  reload_connections false
  reconnect_on_error true
  reload_on_failure true
  log_es_400_reason false
  logstash_prefix messages_logs
  logstash_format true
  logstash_dateformat %V
  index_name "messages_logs"
  type_name "fluentd"
  include_timestamp true
  <buffer>
    @type file
    path /etc/td-agent/messages/buffers
    chunk_limit_size 1M
    flush_interval 5s
    retry_forever false
    retry_max_times 3
    retry_wait 10
    retry_max_interval 300
    flush_thread_count 8
  </buffer>
</match>

`

Expected Behavior or What you need to ask

The same config is working fine for most servers even without char_encoding parameter. Td-agent of same version should have same behaviour across servers with same configuration. The error should go after adding the encoding parameter. ...

Using Fluentd and ES plugin versions

  • OS version Red Hat Enterprise Linux Server release 7.9 (Maipo)

  • Fluentd v0.12 or v0.14/v1.0

    td-agent --version

    td-agent 1.11.1

  • ES plugin 3.x.y/2.x.y or 1.x.y

    • paste result of fluent-gem list, td-agent-gem list or your Gemfile.lock
  td-agent-gem list

*** LOCAL GEMS ***

addressable (2.7.0)
async (1.26.2)
async-http (0.52.4)
async-io (1.30.0)
async-pool (0.3.2)
aws-eventstream (1.1.0)
aws-partitions (1.337.0)
aws-sdk-core (3.102.1)
aws-sdk-kms (1.35.0)
aws-sdk-s3 (1.72.0)
aws-sdk-sqs (1.29.0)
aws-sigv4 (1.2.1)
benchmark (default: 0.1.0)
bigdecimal (default: 2.0.0)
bundler (2.1.4)
cgi (default: 0.1.0)
concurrent-ruby (1.1.6)
console (1.8.2)
cool.io (1.6.0)
csv (default: 3.1.2)
date (default: 3.0.0)
delegate (default: 0.1.0)
did_you_mean (default: 1.4.0)
digest-crc (0.6.1)
elasticsearch (7.8.0)
elasticsearch-api (7.8.0)
elasticsearch-transport (7.8.0)
elasticsearch-xpack (7.9.0)
etc (default: 1.1.0)
excon (0.75.0)
faraday (1.0.1)
fcntl (default: 1.0.0)
ffi (1.13.1)
fiddle (default: 1.0.0)
fileutils (default: 1.4.1)
fluent-config-regexp-type (1.0.0)
fluent-logger (0.8.2)
fluent-plugin-concat (2.4.0)
fluent-plugin-elasticsearch (4.1.1, 4.0.9)
fluent-plugin-grok-parser (2.6.1)
fluent-plugin-kafka (0.13.0)
fluent-plugin-prometheus (1.8.0)
fluent-plugin-prometheus_pushgateway (0.0.2)
fluent-plugin-record-modifier (2.1.0)
fluent-plugin-rewrite-tag-filter (2.3.0)
fluent-plugin-s3 (1.3.3)
fluent-plugin-systemd (1.0.2)
fluent-plugin-td (1.1.0)
fluent-plugin-td-monitoring (1.0.0)
fluent-plugin-webhdfs (1.2.5)
fluentd (1.11.1)
forwardable (default: 1.3.1)
getoptlong (default: 0.1.0)
hirb (0.7.3)
http_parser.rb (0.6.0)
httpclient (2.8.2.4)
io-console (default: 0.5.6)
ipaddr (default: 1.2.2)
ipaddress (0.8.3)
irb (default: 1.2.3)
jmespath (1.4.0)
json (default: 2.3.0)
logger (default: 1.4.2)
ltsv (0.1.2)
matrix (default: 0.2.0)
mini_portile2 (2.5.0)
minitest (5.13.0)
mixlib-cli (1.7.0)
mixlib-config (2.2.3)
mixlib-log (1.7.1)
mixlib-shellout (2.2.7)
msgpack (1.3.3)
multi_json (1.14.1)
multipart-post (2.1.1)
mutex_m (default: 0.1.0)
net-pop (default: 0.1.0)
net-smtp (default: 0.1.0)
net-telnet (0.2.0)
nio4r (2.5.2)
nokogiri (1.11.0.rc2)
observer (default: 0.1.0)
ohai (6.20.0)
oj (3.10.6)
open3 (default: 0.1.0)
openssl (default: 2.1.2)
ostruct (default: 0.2.0)
parallel (1.19.2)
power_assert (1.1.7)
prime (default: 0.1.1)
prometheus-client (0.9.0)
protocol-hpack (1.4.2)
protocol-http (0.20.0)
protocol-http1 (0.13.0)
protocol-http2 (0.14.0)
pstore (default: 0.1.0)
psych (default: 3.1.0)
public_suffix (4.0.5)
quantile (0.2.1)
racc (default: 1.4.16)
rake (13.0.1)
rdkafka (0.8.0)
rdoc (default: 6.2.1)
readline (default: 0.0.2)
readline-ext (default: 0.1.0)
reline (default: 0.1.3)
rexml (default: 3.2.3)
rss (default: 0.2.8)
ruby-kafka (1.1.0)
ruby-progressbar (1.10.1)
rubyzip (1.3.0)
sdbm (default: 1.0.0)
serverengine (2.2.1)
sigdump (0.2.4)
singleton (default: 0.1.0)
stringio (default: 0.1.0)
strptime (0.2.4)
strscan (default: 1.0.3)
systemd-journal (1.3.3)
systemu (2.5.2)
td (0.16.9)
td-client (1.0.7)
td-logger (0.3.27)
test-unit (3.3.4)
timeout (default: 0.1.0)
timers (4.3.0)
tracer (default: 0.1.0)
tzinfo (2.0.2)
tzinfo-data (1.2020.1)
uri (default: 0.10.0)
webhdfs (0.9.0)
webrick (default: 1.6.0)
xmlrpc (0.3.0)
yajl-ruby (1.4.1)
yaml (default: 0.1.0)
zip-zip (0.3)
zlib (default: 1.1.0)
  • ES version (optional) 7.5.1

himanshigpta avatar Nov 01 '20 12:11 himanshigpta