crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Warning : Queue head returned a request that is already in progress

Open teammakdi opened this issue 2 years ago • 6 comments

Which package is this bug report for? If unsure which one to select, leave blank

@crawlee/core

Issue description

We have noticed a warning message in our crawlee logs, and we'd like to address it more elegantly: Queue head returned a request that is already in progress?!{"nextRequestId":"UYy6hMyaThHCBAF","inProgress":true,"recentlyHandled":false}

This issue specifically arises when we utilize the sameDomainDelaySecs feature with [email protected]. Interestingly, we do not encounter this problem when using the same feature with [email protected]. Consequently, we suspect that this warning may be connected to this fix https://github.com/apify/crawlee/pull/2045 .

Could someone kindly investigate this matter further?

Code sample

No response

Package version

3.5.4

Node.js version

18.17.0

Operating system

No response

Apify platform

  • [ ] Tick me if you encountered this issue on the Apify platform

I have tested this on the next release

No response

Other context

No response

teammakdi avatar Sep 14 '23 09:09 teammakdi

@B4nan

teammakdi avatar Sep 20 '23 10:09 teammakdi

Hey @B4nan any update on this?

teammakdi avatar Oct 17 '23 06:10 teammakdi

@B4nan any update?

teammakdi avatar Oct 25 '23 14:10 teammakdi

When there will be updates, you will see them.

https://sindresorhus.com/blog/issue-bumping

Note that sameDomainDelaySecs is a feature brought by the community, and it apparently introduces some new issues. We simply do not have the capacity to tackle those right now, and as a community feature, it won't get a high priority most likely any time soon.

B4nan avatar Oct 25 '23 14:10 B4nan

I was encountering Queue head returned a request that is already in progress?! and Error: Lock file is already being held.

Disabling sameDomainDelaySecs fixed it. Hope that helps anyone else searching for the lock file error.

thkruz avatar Dec 23 '23 18:12 thkruz

I ran into this issue trying to use sameDomainDelaySecs: 1 and maxConcurrency: 1 to slow down crawling of a website, mainly out of a sense of civic responsibility. I tried using the experimental RequestQueueV2 but found that that ended up hanging forever.

The solution I found was to use a preNavigationHook:

preNavigationHooks: [
    async () => {
      // sleep for 1 second
      await new Promise((resolve) => setTimeout(resolve, 1000));
    },
  ],

This accomplished what I wanted, with crawling now proceeding at the desired speed, and no weird warnings or errors. This only has the desired effect with maxConcurrency: 1, any higher there are no guarantees about delay between website hits.

Hoping issues with sameDomainDelaySecs are resolved soon as that would be a preferable method.

RLesser avatar Jan 27 '24 23:01 RLesser