webarchive-commons icon indicating copy to clipboard operation
webarchive-commons copied to clipboard

Issue #4 Guava for public suffix

Open johnerikhalse opened this issue 10 years ago • 1 comments

Replaces code for looking up public suffixes with similar code from Google Guava.

This change breakes one class in Heritrix: org.archive.crawler.processor.HashCrawlMapper Should be easy to fix though.

johnerikhalse avatar Nov 06 '14 14:11 johnerikhalse

Haven't reviewed in detail, but this is great.

One concern though, this removes a method from the public API, namely PublicSuffixes.getTopmostAssignedSurtPrefixRegex(). That probably means the version should bump up to 1.2.0-SNAPSHOT. The heritrix class org.archive.crawler.processor.HashCrawlMapper uses that method, so it will have to be rewritten at some point.

nlevitt avatar Nov 07 '14 21:11 nlevitt