webmagic icon indicating copy to clipboard operation
webmagic copied to clipboard

A scalable web crawler framework for Java.

Results 147 webmagic issues
Sort by recently updated
recently updated
newest added

PriorityScheduler源码如截图: ![image](https://user-images.githubusercontent.com/22490427/117938486-aeaccc80-b339-11eb-9114-5ae4a479dd12.png) 问题:为什么需要使用三个queue?直接把QueueScheduler的队列换成PriorityBlockingQueue就可以了吧?而且统计队列剩余数量好像是错的,只统计一个队列的,请作者看看。 QueueScheduler源码如截图: ![image](https://user-images.githubusercontent.com/22490427/117938791-fcc1d000-b339-11eb-9b13-de356cf68421.png) 请作者指点一下,谢谢!

Can't find any crawler policy and\or property to restrict crawling depth. Is it missed and only way how we can restrict depth is by choosing suitable selector in PageProcessor?

比如有些下载地址是因为网络波动 读取超时 。 这时onError有异常信息才能比较好即时处理

Throws a exception when the waiting time detectably elapsed before return from the method.

- diamond operator since JAVA 7 - naming conventions - duplication

Adding a try-catch-finally clause to properly close the configFileReader file

Found a code smells on a missing decorator. A fix on a code smells, trying to use pull requests for school.

1. Add @Deprecated annotation with both @deprecated Javadoc tag just to enable tools such as IDEs to warn about referencing deprecated elements and to highlight a user when the element...

Hello ! this in this pull request, Im correcting and deleting Code smells, but also refactoring some methods for a better readability.