apm-agent-ruby
apm-agent-ruby copied to clipboard
Allow UTF-8 in request headers
Hi,
Not sure if it's a bug report or feature request or even general discussion. We experience an issue with agent not being able to convert to JSON transactions that contain UTF-8 characters in HTTP headers. From what I understand it's against RFC to have UTF-8 characters in headers, but ....
The issue we have with it is nginx geoip module appending city name, that contains UTF-8 characters and then agent is not able to push transaction to APM.
What do you think?
I think we should at least not fail. What is the experienced behaviour? Is the agent logging JSON conversion errors or is something exploding?
Agent logs an error and does not interfere with request handing.
E, [2020-02-03T10:44:18.518061 #88889] ERROR -- : [ElasticAPM] Failed converting event to JSON: <ElasticAPM::Transaction id:738f09e51ebc182c name:"V1::DateController#show" type:"request">
E, [2020-02-03T10:44:18.518410 #88889] ERROR -- : [ElasticAPM] {:transaction=>{:id=>"738f09e51ebc182c", :trace_id=>"0ed39839ab8a61a0879ab50d365bc312", :parent_id=>nil, :name=>"V1::DateController#show", :type=>"request", :result=>"HTTP 2xx", :duration=>60.87, :timestamp=>1580723058456334, :sampled=>true, :context=>{:custom=>{}, :tags=>{}, :request=>{:body=>"[SKIPPED]", :cookies=>{}, :env=>{"SCRIPT_NAME"=>"", "QUERY_STRING"=>"", "SERVER_PROTOCOL"=>"HTTP/1.1", "SERVER_SOFTWARE"=>"puma 4.3.1 Mysterious Traveller", "GATEWAY_INTERFACE"=>"CGI/1.2", "REQUEST_METHOD"=>"GET", "REQUEST_PATH"=>"/date", "REQUEST_URI"=>"/date", "SERVER_NAME"=>"localhost", "SERVER_PORT"=>"3000", "PATH_INFO"=>"/date", "REMOTE_ADDR"=>"127.0.0.1", "ROUTES_70153857366120_SCRIPT_NAME"=>"", "ORIGINAL_FULLPATH"=>"/date", "ORIGINAL_SCRIPT_NAME"=>""}, :headers=>{"Version"=>"HTTP/1.1", "Host"=>"localhost:3000", "Accept"=>"application/vnd.casinosaga.v1", "User-Agent"=>"hall\xC3\xA5"}, :http_version=>"1.1", :method=>"GET", :socket=>{:remote_addr=>"127.0.0.1", :encrypted=>false}, :url=>{:protocol=>"http", :full=>"http://localhost:3000/date", :hostname=>"localhost", :port=>"3000", :pathname=>"/date", :search=>"", :hash=>nil}}, :response=>{:status_code=>200, :headers=>{"Content-Type"=>"application/json; charset=utf-8", "ETag"=>"W/\"369d2917c2733d314c0eb1f83e4a3f96\"", "Cache-Control"=>"max-age=0, private, must-revalidate", "X-Request-Id"=>"fe6b24d7-861c-474b-9c57-44997c2cd2be", "X-Runtime"=>"0.058751"}, :headers_sent=>true, :finished=>true}, :user=>nil, :service=>nil}, :span_count=>{:started=>1, :dropped=>0}}}
Note User-Agent header value.
I would expect it to send data to APM without any errors. Rails handles UTF-8 headers more or less fine:
I, [2020-02-03T10:40:20.303503 #83382] INFO -- : Processing by V1::DateController#show as
From: /api/app/controllers/v1/date_controller.rb @ line 5 V1::DateController#show:
3: def show
4: require 'pry'; binding.pry
=> 5: render json: { 'date' => Time.zone.now.utc }
6: end
[1] pry(#<V1::DateController>)> request.headers['User-Agent']
=> "hall\xC3\xA5"
[2] pry(#<V1::DateController>)> request.headers['User-Agent'].encoding
=> #<Encoding:ASCII-8BIT>
[4] pry(#<V1::DateController>)> request.headers['User-Agent'].force_encoding('utf-8')
=> "hallå"
I suppose we could add the encoding forcing to the agent. I'll discuss it with @estolfo when she's back in a few days. Until then, do you think something like https://github.com/whitequark/rack-utf8_sanitizer could fix your problem?
I'll probably fork for now, and apply this changes:
diff --git a/lib/elastic_apm/context_builder.rb b/lib/elastic_apm/context_builder.rb
index b523219..f7e8089 100644
--- a/lib/elastic_apm/context_builder.rb
+++ b/lib/elastic_apm/context_builder.rb
@@ -76,9 +76,9 @@ module ElasticAPM
next unless key == key.upcase
if key.start_with?('HTTP_')
- http[camel_key(key)] = value
+ http[camel_key(key)] = value.dup.force_encoding('utf-8')
else
- env[key] = value
+ env[key] = value.dup.force_encoding('utf-8')
end
end
end
until you guys decide if this change is needed and how it should be done properly
I'm going to hold off adding a fix for this in the next gem version, as @mikker and I didn't come to a decision. When he returns, we can decide whether to handle this issue in the agent. In the meantime, anyone who has this issue might be interested in the suggestion made above.