openwayback
openwayback copied to clipboard
Performance issues in OpenWayback
Having spent some time looking at performance issues for the LoC, a couple of possible improvements to OpenWayback came up (see here for details https://gist.github.com/anjackson/06971ff43e50645e3f2f).
Firstly, it's probably worth adding to the documentation that the performance of Tomcat with many threads will probably tank if you are using a ConsoleHandler for logging. Thread contention to write to the console will cause blocks and waits. Just log to a file and tail that instead.
(EDIT: I've just remembered - another issue we found was the the default session timeout was quite long, and when you are getting hit by lots of robots that don't remember cookies/sessions, this can lead to an unsightly build-up of session handles that consume a lot of RAM. Shortening the default session time-out is probably a good idea here.)
Secondly, the CDX file binary search lookup code (FlatFile.java) is terribly inefficient and could be improved in a number of different ways.
The current implementation opens and closes a RandomAccessFile object once for every CDX file for every single request, and I think this is why performance is poor with tens of CDX files. It would be better to re-use the RandomAccessFile objects rather than creating/destroying them (which is not only a heavy JVM overhead, but may also stress the OS as it struggles to cope with the rapid consumption and turnover of file handles).
However, RandomAccessFile is not thread-safe, so in the context of OpenWayback, we would have to wrap them up as ThreadLocal variables. This would mean that each thread would open a separate RandomAccessFile object for each CDX file, hold them open and re-use them. The current code expects to close the input streams that are generated from the files, so that may need to be changed/prevented in order to allow re-use.
A second issue is that the RandomAccessFile readLine function is build on unbuffered single-byte reads, which are very slow on modern kit. We could re-use an existing BufferedRandomAccessFile implementation, like this one (no longer used in Cassandra):
https://github.com/facebookarchive/cassandra/blob/master/src/org/apache/cassandra/io/BufferedRandomAccessFile.java
Or, if we can expect most folks to be on 64bit platforms, we could move to a version that allows OS memory mapping to be used, which will make things very fast if the index size is not too far off RAM size. e.g.
http://dsiutils.di.unimi.it/docs/it/unimi/dsi/io/ByteBufferInputStream.html
Which could replace the current code by combining it with the binary search logic given here:
http://stackoverflow.com/questions/736556/binary-search-in-a-sorted-memory-mapped-file-in-java
However, just using static ThreadLocal RandomAccessFiles should give a major boost, I think, so I'm having a go at that first.
Thanks Andy. Would it be worth noting in the Configuration Documentation that, for improved performance, one might consider:
- Comment out the default
.handlers = 1catalina.org.apache.juli.FileHandler, java.util.logging.ConsoleHandler
line in$CATALINA_HOME/conf/logging.properties
- Reducing the default
session-timeout
value of 30 in$CATALINA_HOME/conf/web.xml
These two changes made a big difference for us. We reduced the session-timeout
to 5. It also might be worth noting that, in regards to the 2nd point above, the session-timeout
can be set in three different ways:
- by issuing the following java option upon tomcat startup:
HttpSession.setMaxInactiveInterval(int)
- via definition in
$CATALINA_HOME/webapps/ROOT/WEB-INF/web.xml
(assuming ROOT installation of Open Wayback) - via deifnition in
$CATALINA_HOME/conf/web.xml
As far as I understand, option 1 will override 2, and 2 will override 3. The web.xml
that ships with Open Wayback does not define session-timeout
. We are using option 3.
https://github.com/nclarkekb/antiaction-common-datastructures
I rewrote my caching flatfile lookup some weeks ago. I extracted it from the original project and placed it in this repository. (Will probaby merge it with some other data structure code I have at some point.) Default is to cache 16 levels of the binary lookup, split file in 2*13 (8k) pages. Custom prefix string matcher. ByteBuffer based RandomAccessFile buffered reader. (Presumably large disks these days have at least 8kb blocks).
From what I have seen in our production environment it is pretty fast, even with 7+TB data.