crawly icon indicating copy to clipboard operation
crawly copied to clipboard

[Feat] Support auto close spider when all requests finished

Open EdmondFrank opened this issue 3 years ago • 2 comments

It seems that when all the workers’ requests list are empty, the crawler still cannot stop automatically.

Although closespider_timeout can solve some scenarios, there is a new problem of ending early when the network environment is not good.

EdmondFrank avatar Jan 06 '22 15:01 EdmondFrank

Hey @EdmondFrank .

It's not quite clear if this approach is going to solve the issue. But still, what is the problem of having just the closespider timeout

oltarasenko avatar Feb 01 '22 20:02 oltarasenko

Hey @EdmondFrank .

It's not quite clear if this approach is going to solve the issue. But still, what is the problem of having just the closespider timeout

Now in the process of using crawly, I encountered two problems.

First, i need develop some slow crawler, the request frequency is about 1 request/60~90s.

Second, some of the websites I crawl are not very stable, sometimes experiencing a denial of service for a few minutes before returning to normal.

In the above two scenarios, sometimes closespider timeout will be 0/min , but all requests have not been crawled yet

EdmondFrank avatar Feb 02 '22 01:02 EdmondFrank

This has been merged in master on September 14th. The same day Crawly 0.14 has been released. But it seems this features is not part of 0.14 release. Was it intentional?

revati avatar Dec 26 '22 20:12 revati