bejean
bejean
see https://groups.google.com/forum/#!topic/crawl-anywhere/3WPCZuwtZCc
According to this message in the forum, implement support for NTLM authentication sheme https://groups.google.com/forum/#!topic/crawl-anywhere/TiAz0rGiIfw
Implement a multi-terms suggester http://wiki.apache.org/solr/Suggester http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/ At the same time check "did you mean" feature.
- Check logging consistency (verbose / no verbose) - Change the action option in testScript class "To test the meta extraction with the script tool, you need to use the...
- Allow remove value in target element (https://groups.google.com/forum/#!topic/crawl-anywhere/KmsyjPsw_vA) - check documentation - add unit test
There are several direct dependencies to html parser libraries - jsoup - jericho-html - htmlcleaner Try to keep only jsoup (already used by snacktory)
Redirect to login page in any cases when a session time-out occurs.
In order to know the real publish date of a document, use when available the date provided by sitemap files.