apm-agent-ruby icon indicating copy to clipboard operation
apm-agent-ruby copied to clipboard

Allow UTF-8 in request headers

Open kryachkov opened this issue 5 years ago • 5 comments

Hi,

Not sure if it's a bug report or feature request or even general discussion. We experience an issue with agent not being able to convert to JSON transactions that contain UTF-8 characters in HTTP headers. From what I understand it's against RFC to have UTF-8 characters in headers, but ....

The issue we have with it is nginx geoip module appending city name, that contains UTF-8 characters and then agent is not able to push transaction to APM.

What do you think?

kryachkov avatar Feb 03 '20 08:02 kryachkov

I think we should at least not fail. What is the experienced behaviour? Is the agent logging JSON conversion errors or is something exploding?

mikker avatar Feb 03 '20 09:02 mikker

Agent logs an error and does not interfere with request handing.

E, [2020-02-03T10:44:18.518061 #88889] ERROR -- : [ElasticAPM] Failed converting event to JSON: <ElasticAPM::Transaction id:738f09e51ebc182c name:"V1::DateController#show" type:"request">
E, [2020-02-03T10:44:18.518410 #88889] ERROR -- : [ElasticAPM] {:transaction=>{:id=>"738f09e51ebc182c", :trace_id=>"0ed39839ab8a61a0879ab50d365bc312", :parent_id=>nil, :name=>"V1::DateController#show", :type=>"request", :result=>"HTTP 2xx", :duration=>60.87, :timestamp=>1580723058456334, :sampled=>true, :context=>{:custom=>{}, :tags=>{}, :request=>{:body=>"[SKIPPED]", :cookies=>{}, :env=>{"SCRIPT_NAME"=>"", "QUERY_STRING"=>"", "SERVER_PROTOCOL"=>"HTTP/1.1", "SERVER_SOFTWARE"=>"puma 4.3.1 Mysterious Traveller", "GATEWAY_INTERFACE"=>"CGI/1.2", "REQUEST_METHOD"=>"GET", "REQUEST_PATH"=>"/date", "REQUEST_URI"=>"/date", "SERVER_NAME"=>"localhost", "SERVER_PORT"=>"3000", "PATH_INFO"=>"/date", "REMOTE_ADDR"=>"127.0.0.1", "ROUTES_70153857366120_SCRIPT_NAME"=>"", "ORIGINAL_FULLPATH"=>"/date", "ORIGINAL_SCRIPT_NAME"=>""}, :headers=>{"Version"=>"HTTP/1.1", "Host"=>"localhost:3000", "Accept"=>"application/vnd.casinosaga.v1", "User-Agent"=>"hall\xC3\xA5"}, :http_version=>"1.1", :method=>"GET", :socket=>{:remote_addr=>"127.0.0.1", :encrypted=>false}, :url=>{:protocol=>"http", :full=>"http://localhost:3000/date", :hostname=>"localhost", :port=>"3000", :pathname=>"/date", :search=>"", :hash=>nil}}, :response=>{:status_code=>200, :headers=>{"Content-Type"=>"application/json; charset=utf-8", "ETag"=>"W/\"369d2917c2733d314c0eb1f83e4a3f96\"", "Cache-Control"=>"max-age=0, private, must-revalidate", "X-Request-Id"=>"fe6b24d7-861c-474b-9c57-44997c2cd2be", "X-Runtime"=>"0.058751"}, :headers_sent=>true, :finished=>true}, :user=>nil, :service=>nil}, :span_count=>{:started=>1, :dropped=>0}}}

Note User-Agent header value.

I would expect it to send data to APM without any errors. Rails handles UTF-8 headers more or less fine:

I, [2020-02-03T10:40:20.303503 #83382]  INFO -- : Processing by V1::DateController#show as

From: /api/app/controllers/v1/date_controller.rb @ line 5 V1::DateController#show:

    3: def show
    4:   require 'pry'; binding.pry
 => 5:   render json: { 'date' => Time.zone.now.utc }
    6: end

[1] pry(#<V1::DateController>)> request.headers['User-Agent']
=> "hall\xC3\xA5"
[2] pry(#<V1::DateController>)> request.headers['User-Agent'].encoding
=> #<Encoding:ASCII-8BIT>
[4] pry(#<V1::DateController>)> request.headers['User-Agent'].force_encoding('utf-8')
=> "hallå"

kryachkov avatar Feb 03 '20 09:02 kryachkov

I suppose we could add the encoding forcing to the agent. I'll discuss it with @estolfo when she's back in a few days. Until then, do you think something like https://github.com/whitequark/rack-utf8_sanitizer could fix your problem?

mikker avatar Feb 03 '20 10:02 mikker

I'll probably fork for now, and apply this changes:

diff --git a/lib/elastic_apm/context_builder.rb b/lib/elastic_apm/context_builder.rb
index b523219..f7e8089 100644
--- a/lib/elastic_apm/context_builder.rb
+++ b/lib/elastic_apm/context_builder.rb
@@ -76,9 +76,9 @@ module ElasticAPM
         next unless key == key.upcase

         if key.start_with?('HTTP_')
-          http[camel_key(key)] = value
+          http[camel_key(key)] = value.dup.force_encoding('utf-8')
         else
-          env[key] = value
+          env[key] = value.dup.force_encoding('utf-8')
         end
       end
     end

until you guys decide if this change is needed and how it should be done properly

kryachkov avatar Feb 03 '20 11:02 kryachkov

I'm going to hold off adding a fix for this in the next gem version, as @mikker and I didn't come to a decision. When he returns, we can decide whether to handle this issue in the agent. In the meantime, anyone who has this issue might be interested in the suggestion made above.

estolfo avatar Apr 20 '20 09:04 estolfo