logstash-filter-geoip icon indicating copy to clipboard operation
logstash-filter-geoip copied to clipboard

Reopen database on errors

Open splitice opened this issue 9 years ago • 11 comments

When the database is hosted on a NFS share it is possible for the handle to become stale and the database to need to be re-opened.

Currently this just results in an avelanche of errors

:message=>"Unknown error while looking up GeoIP data", :exception=>#<IOError: Stale NFS file handle>

At a minimum the Geoip database should be closed (to be re-opened) in case of error.

splitice avatar Jan 06 '16 01:01 splitice

Out of interest, why are you storing it on an NFS share?

markwalkom avatar Jan 06 '16 01:01 markwalkom

So that the 7 logstash servers all: a) Have the same version of the database b) Can be easily updated at the same time

splitice avatar Jan 06 '16 01:01 splitice

The are many issues with doing this though. Do we add some kind of heartbeat to check the file exists if we aren't using it during processing? How do we deal with the rest of the pipeline while we retry access to the file if it disappears? What if it never returns?

The better option, in my opinion that is, would be to use automated deployment tools, puppet/chef/ansible/salt/etc to ensure it is consistent, and not have to worry about NFS at all.

Maybe someone else can comment with their thoughts as well :)

markwalkom avatar Jan 06 '16 01:01 markwalkom

Unfortunately as we have other distributed services running on each logstash machine that dont fit well with any sort of adjustment due to a lack of add/remove from the distributed database, hence we cant currently use that workflow.

NFS has worked well for quite a while until we added GeoIP. The IO is minimal, and very easy to deploy and manage.

I would be quite happy to have GeoIP skipped or even a cool down if the Database cant be opened. Right now that message is spewed out forever at a rate of multiple GB/min in our case. Filling the log storage.

splitice avatar Jan 06 '16 02:01 splitice

I think a cooldown would be a good idea, but given the current handling I still feel this simple solution is still a significant step forward.

  1. It produces no worse handling than the current implementation
  2. It produces better handling for a bunch of file related issues, expecially on remote (nfs, sshfs, etc) or distributed filesystems (glusterfs)

I have not tested the commit yet, I'll need to figure out how to stale a filehandle. Its simple though.

https://github.com/splitice/logstash-filter-geoip/commit/6d58ea78f7d91b25264b627896147b9381864bea

splitice avatar Jan 06 '16 02:01 splitice

#41 could also be useful

splitice avatar Jan 06 '16 02:01 splitice

I don't' think we really expect the file to be changed while logstash is running, so if that's what you're doing, as a workaround for now, you may need to restart Logstash in order to update your geoip file.

I strongly discourage NFS due to it's behavioral problems. However, in this case, maybe we can catch the specific "IOError: Stale NFS file handle" error and try to reopen the file in specifically that case (not all IOErrors, just stale file handle one)

jordansissel avatar Jan 06 '16 22:01 jordansissel

I have never mentioned changing the file while running. I too am unsure if that would work.

No other component (logstash-core etc) has any trouble with NFS, we store logstash, our plugins and its configuration in this way.

splitice avatar Jan 06 '16 23:01 splitice

And I am not certain how to get the errno from an IOError in Ruby. Its not a language I am overly familiar with.

splitice avatar Jan 06 '16 23:01 splitice

Did you try the preload option?

Check at https://github.com/logstash-plugins/logstash-filter-geoip/pull/63/files.

ebuildy avatar Apr 23 '16 12:04 ebuildy

No I havent.

I ended up setting up a shell script to rsync the file from NFS to temporary in-memory storage. And load it from there.

splitice avatar Apr 23 '16 12:04 splitice