serilog-sinks-elasticsearch icon indicating copy to clipboard operation
serilog-sinks-elasticsearch copied to clipboard

LogShipper try to send incomplete/premature json.

Open Fredrik-C opened this issue 4 years ago • 7 comments

  • [x] Bug
  • [ ] New Feature

What version of Serilog.Sinks.Elasticsearch is affected? Please list the related NuGet package. Serilog.Sinks.Elasticsearch Version=8.4.1

What is the target framework and operating system? netcoreapp3.1 Linux based docker container

Please describe the current behavior?

Some log entries fails to be shipped/accepted. Serilog reports errors: -HTTP shipping failed with 200: Successful (200) low level call on POST: /_bulk; dumping payload to xxx/invalid-200-xxx.json -Received failed ElasticSearch shipping result 400: type: mapper_parsing_exception, reason: failed to parse. Failed payload : << enter partial non-complete json here >>

We beleive the sink tries to send premature json. When we inspect the reported invalid-200-xxx.json file we find partial/truncated/unterminated json. When we inspect the elastic_buffer-xxx.json and find the failed entry (by unique identifier/correlation) we find the complete and correctly formatted json persisted to the durable buffer file.

We are logging large entries (http respose/request bodies) that are larger then 1024 bytes and our suspicion is that Serilog-File-Sink (used to durable file buffer) flushes partial log entires due to > 1024 bytes and with a bit of unfortunate timing the periodic logshipment picks up the partially written logentry.

Please describe the expected behavior? LogShipper should only try to send complete json.

If the current behavior is a bug, please provide the steps to reproduce the issue and if possible a minimal demo of the problem

*With default periodic shipment (5 sec) *Entries with ~1500 UTF8 characters *Rate ~2000/hour entries

With the above load, we experience ~1 lost/failed logentry per hour. This will most likely behave differently depending on i/o speed of storage.

Fredrik-C avatar Feb 09 '21 11:02 Fredrik-C

Thanks for the report. Unfortunately the durable sink is not that ideal and is hard to maintain. I have not seen this behaviour however. Did you try the BufferFileSizeLimitBytes option? Not sure it will help here however.

mivano avatar Feb 13 '21 22:02 mivano

Did you try the BufferFileSizeLimitBytes option?

My understanding is that the BufferFileSizeLimitBytes only impacts how much the offline buffer is allowed to grow?

I think the problem is related to how SharedFileSink.AtomicAppend.cs is working with write buffer (EmitOrOverflow) and a timing issue when the PayloadReader gets triggered by the LogShipper.

The bufferining AtomicAppend (and underlying streamwriter) will write a single logentry in chunks for large log entries and the PayloadReader will read a chunk (or several chunks) but not the complete logentry due to timing (and size of log entry).

Fredrik-C avatar Feb 14 '21 11:02 Fredrik-C

@Fredrik-C Is there a workaround to prevent this from happening ? We are facing the same issue .

karthi89msc avatar Apr 07 '21 19:04 karthi89msc

@karthi89msc

We haven't found any workaround besides limiting the size of our logentries. But since we log bus-messages and http bodies we will loose some logentries.

Someday (when there is time to spare) I might attempt to solve this in a pull-request with some simple json validator that validates the json payload before it is sent/shipped (problem is handeling how long to keep the payload in quarantine and wait for eventual completion of payload)

Validation would go here: https://github.com/serilog/serilog-sinks-elasticsearch/blob/6cafc3e94551e81b2e9e3bf33ecf095764e8b0c4/src/Serilog.Sinks.Elasticsearch/Sinks/ElasticSearch/Durable/LogShipper.cs#L172

It could also be caused by buffersize here: https://github.com/serilog/serilog-sinks-elasticsearch/blob/6cafc3e94551e81b2e9e3bf33ecf095764e8b0c4/src/Serilog.Sinks.Elasticsearch/Sinks/ElasticSearch/Durable/APayloadReader.cs#L105

But I have not had time to experiment anything with this.

Fredrik-C avatar Apr 07 '21 19:04 Fredrik-C

Any chance to have this fixed?

otto-gebb avatar Sep 03 '21 12:09 otto-gebb

@Fredrik-C We have this problem, too. Did you get a chance to look into the problem?

seriouz avatar Sep 09 '21 09:09 seriouz

A fix seems unlikely. As an alternative to this lib, consider writing logs to a file in JSON format and sending them to Elasticsearch via Fluentbit.

otto-gebb avatar Jan 27 '23 16:01 otto-gebb