serilog-sinks-elasticsearch
serilog-sinks-elasticsearch copied to clipboard
LogShipper try to send incomplete/premature json.
- [x] Bug
- [ ] New Feature
What version of Serilog.Sinks.Elasticsearch is affected? Please list the related NuGet package. Serilog.Sinks.Elasticsearch Version=8.4.1
What is the target framework and operating system? netcoreapp3.1 Linux based docker container
Please describe the current behavior?
Some log entries fails to be shipped/accepted. Serilog reports errors: -HTTP shipping failed with 200: Successful (200) low level call on POST: /_bulk; dumping payload to xxx/invalid-200-xxx.json -Received failed ElasticSearch shipping result 400: type: mapper_parsing_exception, reason: failed to parse. Failed payload : << enter partial non-complete json here >>
We beleive the sink tries to send premature json. When we inspect the reported invalid-200-xxx.json file we find partial/truncated/unterminated json. When we inspect the elastic_buffer-xxx.json and find the failed entry (by unique identifier/correlation) we find the complete and correctly formatted json persisted to the durable buffer file.
We are logging large entries (http respose/request bodies) that are larger then 1024 bytes and our suspicion is that Serilog-File-Sink (used to durable file buffer) flushes partial log entires due to > 1024 bytes and with a bit of unfortunate timing the periodic logshipment picks up the partially written logentry.
Please describe the expected behavior? LogShipper should only try to send complete json.
If the current behavior is a bug, please provide the steps to reproduce the issue and if possible a minimal demo of the problem
*With default periodic shipment (5 sec) *Entries with ~1500 UTF8 characters *Rate ~2000/hour entries
With the above load, we experience ~1 lost/failed logentry per hour. This will most likely behave differently depending on i/o speed of storage.
Thanks for the report. Unfortunately the durable sink is not that ideal and is hard to maintain. I have not seen this behaviour however. Did you try the BufferFileSizeLimitBytes option? Not sure it will help here however.
Did you try the BufferFileSizeLimitBytes option?
My understanding is that the BufferFileSizeLimitBytes only impacts how much the offline buffer is allowed to grow?
I think the problem is related to how SharedFileSink.AtomicAppend.cs is working with write buffer (EmitOrOverflow) and a timing issue when the PayloadReader gets triggered by the LogShipper.
The bufferining AtomicAppend (and underlying streamwriter) will write a single logentry in chunks for large log entries and the PayloadReader will read a chunk (or several chunks) but not the complete logentry due to timing (and size of log entry).
@Fredrik-C Is there a workaround to prevent this from happening ? We are facing the same issue .
@karthi89msc
We haven't found any workaround besides limiting the size of our logentries. But since we log bus-messages and http bodies we will loose some logentries.
Someday (when there is time to spare) I might attempt to solve this in a pull-request with some simple json validator that validates the json payload before it is sent/shipped (problem is handeling how long to keep the payload in quarantine and wait for eventual completion of payload)
Validation would go here: https://github.com/serilog/serilog-sinks-elasticsearch/blob/6cafc3e94551e81b2e9e3bf33ecf095764e8b0c4/src/Serilog.Sinks.Elasticsearch/Sinks/ElasticSearch/Durable/LogShipper.cs#L172
It could also be caused by buffersize here: https://github.com/serilog/serilog-sinks-elasticsearch/blob/6cafc3e94551e81b2e9e3bf33ecf095764e8b0c4/src/Serilog.Sinks.Elasticsearch/Sinks/ElasticSearch/Durable/APayloadReader.cs#L105
But I have not had time to experiment anything with this.
Any chance to have this fixed?
@Fredrik-C We have this problem, too. Did you get a chance to look into the problem?
A fix seems unlikely. As an alternative to this lib, consider writing logs to a file in JSON format and sending them to Elasticsearch via Fluentbit.