crawlee
crawlee copied to clipboard
Warning : Queue head returned a request that is already in progress
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/core
Issue description
We have noticed a warning message in our crawlee logs, and we'd like to address it more elegantly:
Queue head returned a request that is already in progress?!{"nextRequestId":"UYy6hMyaThHCBAF","inProgress":true,"recentlyHandled":false}
This issue specifically arises when we utilize the sameDomainDelaySecs feature with [email protected]. Interestingly, we do not encounter this problem when using the same feature with [email protected]. Consequently, we suspect that this warning may be connected to this fix https://github.com/apify/crawlee/pull/2045 .
Could someone kindly investigate this matter further?
Code sample
No response
Package version
3.5.4
Node.js version
18.17.0
Operating system
No response
Apify platform
- [ ] Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
@B4nan
Hey @B4nan any update on this?
@B4nan any update?
When there will be updates, you will see them.
https://sindresorhus.com/blog/issue-bumping
Note that sameDomainDelaySecs is a feature brought by the community, and it apparently introduces some new issues. We simply do not have the capacity to tackle those right now, and as a community feature, it won't get a high priority most likely any time soon.
I was encountering Queue head returned a request that is already in progress?! and Error: Lock file is already being held.
Disabling sameDomainDelaySecs fixed it. Hope that helps anyone else searching for the lock file error.
I ran into this issue trying to use sameDomainDelaySecs: 1 and maxConcurrency: 1 to slow down crawling of a website, mainly out of a sense of civic responsibility. I tried using the experimental RequestQueueV2 but found that that ended up hanging forever.
The solution I found was to use a preNavigationHook:
preNavigationHooks: [
async () => {
// sleep for 1 second
await new Promise((resolve) => setTimeout(resolve, 1000));
},
],
This accomplished what I wanted, with crawling now proceeding at the desired speed, and no weird warnings or errors. This only has the desired effect with maxConcurrency: 1, any higher there are no guarantees about delay between website hits.
Hoping issues with sameDomainDelaySecs are resolved soon as that would be a preferable method.