bejean

Results 29 issues of bejean

In search interface add an option in order to boost recent documents based on the real publish date or the first crawl date

enhancement

Use elasticsearch as an alternative to Solr. implies : - pipeline mapping stage creation - indexer update - search interface update advantages : - dynamic mapping for better multi-lingual indexing...

enhancement

Create pipeline stages in order to add NLP features like : - named entities extraction - summarization Look at : - Weka - http://www.cs.waikato.ac.nz/~ml/index.html - OpenNLP - Gate - UIMA

enhancement

Add a check for deletion period parameter. In order to avoid check for deletion at each crawl. 0 for this parameter disables check for deletion

enhancement

Some pages have to be rewritten. - http://www.crawl-anywhere.com/configure-a-web-site-to-be-crawled - Done : http://www.crawl-anywhere.com/solr-3-x-or-solr-4-x/

enhancement

https://groups.google.com/forum/#!topic/crawl-anywhere/s6Bdz2ZW-28

Task

https://groups.google.com/forum/#!topic/crawl-anywhere/pyGVxCwsMOw

Task

https://groups.google.com/forum/#!topic/crawl-anywhere/B6CNSiWYCzw

Task

Terminated web site crawls remain in crawling list for a very long time.

Task