crawly [Feat] Support auto close spider when all requests finished

[Feat] Support auto close spider when all requests finished

Open EdmondFrank opened this issue 3 years ago • 2 comments

It seems that when all the workers’ requests list are empty, the crawler still cannot stop automatically.

Although closespider_timeout can solve some scenarios, there is a new problem of ending early when the network environment is not good.

Jan 06 '22 15:01 EdmondFrank

Hey @EdmondFrank .

It's not quite clear if this approach is going to solve the issue. But still, what is the problem of having just the closespider timeout

Feb 01 '22 20:02 oltarasenko

Hey @EdmondFrank .

It's not quite clear if this approach is going to solve the issue. But still, what is the problem of having just the closespider timeout

Now in the process of using crawly, I encountered two problems.

First, i need develop some slow crawler, the request frequency is about 1 request/60~90s.

Second, some of the websites I crawl are not very stable, sometimes experiencing a denial of service for a few minutes before returning to normal.

In the above two scenarios, sometimes closespider timeout will be 0/min , but all requests have not been crawled yet

Feb 02 '22 01:02 EdmondFrank

This has been merged in master on September 14th. The same day Crawly 0.14 has been released. But it seems this features is not part of 0.14 release. Was it intentional?

Dec 26 '22 20:12 revati

crawly crawly copied to clipboard

[Feat] Support auto close spider when all requests finished

crawly
crawly copied to clipboard