openwayback Performance issues in OpenWayback

Having spent some time looking at performance issues for the LoC, a couple of possible improvements to OpenWayback came up (see here for details https://gist.github.com/anjackson/06971ff43e50645e3f2f).

Firstly, it's probably worth adding to the documentation that the performance of Tomcat with many threads will probably tank if you are using a ConsoleHandler for logging. Thread contention to write to the console will cause blocks and waits. Just log to a file and tail that instead.

(EDIT: I've just remembered - another issue we found was the the default session timeout was quite long, and when you are getting hit by lots of robots that don't remember cookies/sessions, this can lead to an unsightly build-up of session handles that consume a lot of RAM. Shortening the default session time-out is probably a good idea here.)

Secondly, the CDX file binary search lookup code (FlatFile.java) is terribly inefficient and could be improved in a number of different ways.

The current implementation opens and closes a RandomAccessFile object once for every CDX file for every single request, and I think this is why performance is poor with tens of CDX files. It would be better to re-use the RandomAccessFile objects rather than creating/destroying them (which is not only a heavy JVM overhead, but may also stress the OS as it struggles to cope with the rapid consumption and turnover of file handles).

However, RandomAccessFile is not thread-safe, so in the context of OpenWayback, we would have to wrap them up as ThreadLocal variables. This would mean that each thread would open a separate RandomAccessFile object for each CDX file, hold them open and re-use them. The current code expects to close the input streams that are generated from the files, so that may need to be changed/prevented in order to allow re-use.

A second issue is that the RandomAccessFile readLine function is build on unbuffered single-byte reads, which are very slow on modern kit. We could re-use an existing BufferedRandomAccessFile implementation, like this one (no longer used in Cassandra):

https://github.com/facebookarchive/cassandra/blob/master/src/org/apache/cassandra/io/BufferedRandomAccessFile.java

Or, if we can expect most folks to be on 64bit platforms, we could move to a version that allows OS memory mapping to be used, which will make things very fast if the index size is not too far off RAM size. e.g.

http://dsiutils.di.unimi.it/docs/it/unimi/dsi/io/ByteBufferInputStream.html

Which could replace the current code by combining it with the binary search logic given here:

http://stackoverflow.com/questions/736556/binary-search-in-a-sorted-memory-mapped-file-in-java

However, just using static ThreadLocal RandomAccessFiles should give a major boost, I think, so I'm having a go at that first.

Jul 15 '15 20:07 anjackson

Thanks Andy. Would it be worth noting in the Configuration Documentation that, for improved performance, one might consider:

Comment out the default .handlers = 1catalina.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler line in $CATALINA_HOME/conf/logging.properties
Reducing the default session-timeout value of 30 in $CATALINA_HOME/conf/web.xml

These two changes made a big difference for us. We reduced the session-timeout to 5. It also might be worth noting that, in regards to the 2nd point above, the session-timeout can be set in three different ways:

by issuing the following java option upon tomcat startup: HttpSession.setMaxInactiveInterval(int)
via definition in $CATALINA_HOME/webapps/ROOT/WEB-INF/web.xml (assuming ROOT installation of Open Wayback)
via deifnition in $CATALINA_HOME/conf/web.xml

As far as I understand, option 1 will override 2, and 2 will override 3. The web.xml that ships with Open Wayback does not define session-timeout. We are using option 3.

Jul 16 '15 12:07 arderyp

https://github.com/nclarkekb/antiaction-common-datastructures

I rewrote my caching flatfile lookup some weeks ago. I extracted it from the original project and placed it in this repository. (Will probaby merge it with some other data structure code I have at some point.) Default is to cache 16 levels of the binary lookup, split file in 2*13 (8k) pages. Custom prefix string matcher. ByteBuffer based RandomAccessFile buffered reader. (Presumably large disks these days have at least 8kb blocks).

From what I have seen in our production environment it is pretty fast, even with 7+TB data.

Jul 22 '15 23:07 nclarkekb

openwayback openwayback copied to clipboard

Performance issues in OpenWayback

openwayback
openwayback copied to clipboard