logstash-input-http
logstash-input-http copied to clipboard
Error in handling charsets different from UTF-8
- Version: 3.3.5
- Operating System:
- Config File (if you have sensitive info, please remove it):
input {
http {
port => 9006
codec => plain {
charset => "CP1254"
}
}
}
output {
stdout {
codec => json {charset => "UTF-8"}
}
}
- Sample Data: python script to use as client to send encoded data
import requests
API_ENDPOINT = "http://127.0.0.1:9006"
message='TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç'
r = requests.post(url = API_ENDPOINT, data = bytes(message,'cp1254'))
- Steps to Reproduce:
- run logstash with the pipeline
- execute the python script
- the console output is:
{"message":"T�RK�E karakter test : ������������","@version":"1","@timestamp":"2020-11-30T10:38:55.338Z","headers":{"connection":"keep-alive","request_method":"POST","http_accept":"*/*","http_user_agent":"python-requests/2.21.0","content_length":"35","http_version":"HTTP/1.1","http_host":"127.0.0.1:9006","request_path":"/","accept_encoding":"gzip, deflate"},"host":"127.0.0.1"}
This seems not to be a problem in the codec because I've tried with this pipeline (same codec, different input):
input {
file {
path => "/tmp/cp1254_encoded.txt"
mode => "read"
sincedb_path => "/dev/null"
file_completed_log_path => "/tmp/file_actions.log"
file_completed_action => "log"
codec => plain {
charset => "CP1254"
}
}
}
output {
stdout {
codec => json {charset => "UTF-8"}
}
}
with the file attached as input data cp1254_encoded.txt
and the console out is what's expected (TÜRKÇE karakter test : ĞÜŞİÇÖışüğöç
)
NB:
to reproduce the text file simply cut&paste the above string in a text editor and ask the editor to save it with encoding CP1254
Hi guys, any improvement about this issue ?
Hi @GokcerBelgusen actually no news on this, but I'll keep track in my radar