logstash-input-cloudwatch-logs icon indicating copy to clipboard operation
logstash-input-cloudwatch-logs copied to clipboard

Do not read all log streams

Open Najdzionix opened this issue 6 years ago • 12 comments

For example when my log-group contains than 31 log-streams than cloud-watch-logs take only random 30 log stream. It seem to me plugin ignore last log-stream my configuration: cloudwatch_logs { log_group => [ "test-app" ] log_group_prefix => true aws_credentials_file => "/aws/crednetials/AWSCredentials.yaml" region => "eu-west-1" type => "test" } Is there read limit log-stream?

Najdzionix avatar Mar 19 '18 14:03 Najdzionix

I suspect problem is located in method "process_group", when search log events: resp = @cloudwatch.filter_log_events(params) AWS documentation say it :

The maximum number of events to return. The default is 10,000

so when we try read/search log group with big number log stream and contains a lot of data we get only small piece of data (10k logs). We got next_token to read next part of data but we set also "start_time" which is was updated on first time (first serach).It can be cause losing old events. I am not ruby programmer but move line " _sincedb_write" outside the loops it can help. Or when we have "next_token" than no set start_time

Najdzionix avatar Mar 22 '18 08:03 Najdzionix

It seem to me I fixed issue. I change a little method "process_group"

  private
  def process_group(group)
    next_token = nil
    loop do
      if [email protected]?(group)
        @sincedb[group] = 0
      end
      if next_token.nil?
        params = {
            :log_group_name => group,
            :start_time => @sincedb[group],
            :interleaved => true,
            :next_token => next_token
        }
      else 
        params = {
          :log_group_name => group,          
          :interleaved => true,
          :next_token => next_token
        }
      end
      resp = @cloudwatch.filter_log_events(params)

      resp.events.each do |event|
        process_log(event, group)
      end
     
      _sincedb_write

      next_token = resp.next_token
      break if next_token.nil?
    end
    @priority.delete(group)
    @priority << group
  end #def process_group```

Najdzionix avatar Mar 23 '18 06:03 Najdzionix

Did you create a pull request?

LM1LC3N7 avatar Jul 04 '18 12:07 LM1LC3N7

Hi @LM1LC3N7 I can not create branch on this repo remote: Permission to lukewaite/logstash-input-cloudwatch-logs.git denied to Najdzionix. fatal: unable to access 'https://github.com/lukewaite/logstash-input-cloudwatch-logs.git/': The requested URL returned error: 403 I attach git patch 0001-fix-for-reading-log-stream-for-more-than-10k-logs.patch.zip

Najdzionix avatar Jul 04 '18 15:07 Najdzionix

Same issue! :/

manaspaldhe12 avatar Jul 13 '18 21:07 manaspaldhe12

I pretty sure it should be works. For my case it works correctly. You are sure used fixed version plugin. I had a lot problems with install manually plugin on logstash.
I suggest add loggers, to be sure if you are using fixed plugin. for example add logs.

if next_token.nil?       
  params = {
      :log_group_name => group,
      :start_time => @sincedb[group],
      :interleaved => true
  }
  logger.info("No token Since db #{parse_time(@sincedb[group])} for group:#{group}")
else         
  params = {
    :log_group_name => group,          
    :interleaved => true,
    :next_token => next_token
  }
  logger.info("With Token Since db  #{parse_time(@sincedb[group])} for group:#{group}")
end

Najdzionix avatar Jul 14 '18 09:07 Najdzionix

@Najdzionix can you create a fork in order to apply your patch? If @lukewaite didn't apply it, it could be a solution.

LM1LC3N7 avatar Aug 01 '18 08:08 LM1LC3N7

Hi @LM1LC3N7 I forked repository and you can check it - https://github.com/Najdzionix/logstash-input-cloudwatch-logs.git

Najdzionix avatar Aug 04 '18 13:08 Najdzionix

@lukewaite I was going to open up this exact same issue (which seems to happen for large number of logs). I sent a pull request with the change that @Najdzionix had. I have noticed that there has not really been activity in a while, are you still maintaining this library?

camerondavison avatar Jun 28 '19 17:06 camerondavison

I've added some changes to the plugin script because the plugin with changes from https://github.com/lukewaite/logstash-input-cloudwatch-logs/pull/78 downloads logs nonstop. Some log events were duplicated. With my changes I get almost all log events (the plugin can loose 1-2 events per 100 thousands), but if a stream is intensive (more then 50 thousands events per hour) you will get logs with some delay (5-10 minutes).

VladislavAnd avatar Jun 04 '20 10:06 VladislavAnd

@VladislavAnd i have a very intensive ( close to 4 millions of events per hour) and the logs are getting delayed by close to an hour. is there a way to improve the performance . i have tried assigning more workers to the pipeline and that hasn't helped much

kjamsheed avatar Aug 04 '20 14:08 kjamsheed

@kjamsheed It is caused by strange behavior of aws logs pagination. If you manually request for logs then you'll notice that often aws sends you pages with empty answer but next token only. And script have to request next page and this steps spend a lot of time. So may be possible to request AWS support to find out reason of this issue.

VladislavAnd avatar Aug 06 '20 11:08 VladislavAnd