crawl-anywhere icon indicating copy to clipboard operation
crawl-anywhere copied to clipboard

Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.

Results 38 crawl-anywhere issues
Sort by recently updated
recently updated
newest added

As free IP geolocalisation WS are often unavailable or deprecated, allows easy custom class implementation. http://www.geoiptool.com/ don't provide informations as xml anymore

enhancement

Add a max pages number option. Should this be the maximum number of pages fetched on the server or the max number of pages sent to the pipeline ? This...

enhancement

Create a fast recrawl option. This option could allow to recrawl a web site often an quickly by crawling only at a maximum depth of 1 or 2 levels for...

enhancement

https://groups.google.com/forum/#!topic/crawl-anywhere/tdkJNIjuB5E

Task

see https://groups.google.com/forum/#!topic/crawl-anywhere/3WPCZuwtZCc

enhancement

see https://groups.google.com/forum/#!topic/crawl-anywhere/3WPCZuwtZCc

enhancement

According to this message in the forum, implement support for NTLM authentication sheme https://groups.google.com/forum/#!topic/crawl-anywhere/TiAz0rGiIfw

enhancement

Redesign admin Web UI with twitter bootstrap.

enhancement

Implement a multi-terms suggester http://wiki.apache.org/solr/Suggester http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/ At the same time check "did you mean" feature.

enhancement

- Check logging consistency (verbose / no verbose) - Change the action option in testScript class "To test the meta extraction with the script tool, you need to use the...

enhancement