crawl-anywhere icon indicating copy to clipboard operation
crawl-anywhere copied to clipboard

Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.

Results 38 crawl-anywhere issues
Sort by recently updated
recently updated
newest added

Files not found when i'm trying download: http://www.crawl-anywhere.com/download-crawl-anywhere/

Resolves issue https://github.com/bejean/crawl-anywhere/issues/88

Fix for https://github.com/bejean/crawl-anywhere/issues/86

The output file from the pipeline does not have the title and content correctly and when it is indexed into solr it is coming up as ???????? title sample ?????????????????????????????????...

I tried to setup version 4.x.x but front view is not configured properly due to some php function issue. Using PHP version 5.3.x. Can any one help to resolve it.

Currently "bypass_robots_file" property can only be configured globally. It would be useful to have the ability to configure it on a per-host basis, ideally via admin UI.

In fr.eolya.utils.http.HttpLoader.getAuthCookies method: ``` java // A CookieStore object is created CookieStore cookieStore = new BasicCookieStore(); HttpContext localContext = new BasicHttpContext(); // ... and set localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore); /* ... */...

When trying to test finding links action with tools_test_scripts.sh, I never get to see any output of found links. Even when hardcoding them. Neither does it show exceptions (after intentionally...

bug

Hi, With the latest version of the Tika wrapper seem to have fixed some international sites, some sites are still experiencing the title being not parsed correctly. Ex URL :...

When i import a source, the source will not be crawled. I exported a running source an match it with a new imported source and they look different. And the...

bug