awslogs icon indicating copy to clipboard operation
awslogs copied to clipboard

Missing data

Open shommersom opened this issue 4 years ago • 4 comments
trafficstars

I was using AwsLogs to request a large volume of data over an extended period and when I repeated the exact same request the resulting logfiles were different. I used something along the lines of the following command: awslogs get -G --timestamp log-group "log-stream-*" --filter-pattern="node-id." -s "2021-02-19 11:00" -e "2021-03-08 11:59:59" | grep "node-id.[RT]X" > "node-id.log"

In one instance all data from 2021-02-26 till 2021-03-02 was missing from one of the logging steps, in another case only 5 minutes of data was missing.

shommersom avatar Mar 08 '21 14:03 shommersom

I'm seeing something similar: awslogs returns data with some gaps, e.g. of a half-hour worth of events (though the missing data is visible in console). Running again half an hour later though returns the missed data, so that it looks complete. But if it happens at all, who can be sure if the data is ever accurate?

dsagal avatar Apr 08 '21 21:04 dsagal

This has been driving me nuts.

I am deleting an older version of this comment which talked about issues with aws logs get-log-events -- that does not do pagination, and even if it returns fewer than the limit events (10k by default), it doesn't mean that all events were included. There is also a byte limit, and potentially other reasons for pages to be incomplete. So that's a separate issue that confused me.

I don't have a consistent way to reproduce the problem of missed messages. I've reproduced it with using filterLogEvents directly, as well as with getLogEvents (both with pagination). The most common way was to fetch several days of logs from all streams in a group. Compared to logs fetched for another overlapping time, I would notice that some interval of messages from some streams got missed in one of the fetches, but not always the same ones. However, doing it today returned all messages, so perhaps there was a transient problem on AWS side.

I wasn't able to find any issues reported on AWS side, and not sure where it should be reported. It's hard without a way to reproduce.

This theory I came up with, I no longer have any confidence in. So crossing it out, but leaving here for reference. ~UPDATE: I think I understand the cause. AWS documentation lists a limit of 10,000 messages or 1MB. In reality, it looks like these limits are applied at different stages, in a somewhat-broken way. If a page of messages exceeds the byte limit, the data that didn't fit gets omitted. Pagination offers the next page, but based on message count, and the next page does not include the data that was omitted due to byte count.~

~The solution that seems to work is to use lower limits (e.g. 1000) to ensure each returned page fits within 1MB, plus pagination. An extra stumbling block is that it helps with getLogEvents, but not with filterLogEvents. awslogs relies on filterLogEvents. Smaller limit doesn't help there. My guess is that the implementation of filterLogEvents on AWS side suffers internally from the same problem.~

dsagal avatar Apr 09 '21 19:04 dsagal

I'm having the same issue: I used --start='1h' --watch to observe logs streams live, and I noticed that one log was missing. I then re-issues the command and the log was there. This makes it difficult to trust the output unfortunately.

fievelk avatar Oct 06 '23 08:10 fievelk

@jorgebastida is awslogs currently actively maintained? I see that the last commits are from 2020, but I'm not sure if this is because the project is considered to be stable and complete, or if it's currently not maintained anymore. So I'm not sure whether these issues are going to be taken into consideration :)

fievelk avatar Oct 09 '23 09:10 fievelk