crawling-framework icon indicating copy to clipboard operation
crawling-framework copied to clipboard

Easily crawl news portals or blog sites using Storm Crawler.

Results 21 crawling-framework issues
Sort by recently updated
recently updated
newest added

When storing temporary data, ElasticSearch can become bottleneck. Optionally, use Redis for that.

Increasing In Topology.worker=4 Stop Doing Crawling. Then No Use Of Storm Cluster. If It Fail.

Stats button is showing the status of the crawl, but if there is nothing crawled it would be good to see it in the table, without opening the stats popup....

Currently configuration can be managed only through Administration UI

enhancement

- [ ] Upload CSV with sources, related (#2) - [ ] Check which ones are already configured. - [ ] other validations TODO. - [ ] export CSV with...

Error should also log erroneous JSON so that we could learn how to pre-process it to avoid such errors ``` WARN l.t.c.p.u.JsonLdParser - Failed to parse ld+json com.fasterxml.jackson.core.JsonParseException: Document contains...