webarchive-commons
webarchive-commons copied to clipboard
Issue #4 Guava for public suffix
Replaces code for looking up public suffixes with similar code from Google Guava.
This change breakes one class in Heritrix: org.archive.crawler.processor.HashCrawlMapper Should be easy to fix though.
Haven't reviewed in detail, but this is great.
One concern though, this removes a method from the public API, namely PublicSuffixes.getTopmostAssignedSurtPrefixRegex(). That probably means the version should bump up to 1.2.0-SNAPSHOT. The heritrix class org.archive.crawler.processor.HashCrawlMapper uses that method, so it will have to be rewritten at some point.