RequestQueue locks might get lost in specific scenarios
Error scenario
- fetch and lock 25 requests in
RequestQueue, lock time is 60s - each takes 10s, which is within the request handler timeout
- after a couple of requests, the "locally dequeued" requests won't be locked anymore
This becomes worse if the user runs a CPU-bound thing that stalls the node.js event loop.
Possible solutions
- ignore it
- run a background loop that periodically prolongs all held locks
- prolong locks in
fetchNextRequest- that should be called at least once per requestHandlerTimeout
Hii @janbuchar I've run several tests to try to reproduce the RequestQueue lock issue with different configurations:
-
Initial test with 25 requests and 5 concurrent requests:
- All requests processed successfully
- No lock issues observed
- Average request duration: ~12 seconds
-
More aggressive test with 100 requests and 20 concurrent requests:
- All requests processed successfully
- No lock issues observed
- Average request duration: ~4.7 seconds
- Requests finished per minute: 61
-
Most aggressive test with 200 requests and 50 concurrent requests:
- All requests processed successfully
- No lock issues observed
- Average request duration: ~2.8 seconds
- Requests finished per minute: 52-54
- Added random delays (500ms-3000ms) to simulate network latency
In all test cases:
- No failed requests
- No duplicate processing
- No lost requests
- Proper ordering of request processing
- System resources were well-managed (occasional event loop overload but system recovered)
The RequestQueue implementation appears to be handling concurrent access correctly:
- Requests are properly locked while being processed
- No race conditions observed
- Queue maintains proper ordering
- Failed requests are properly retried
Could you provide more details about:
- The specific conditions under which you're seeing the lock issues?
- Any error messages or symptoms you're observing?
- Your crawler configuration (concurrency, request handler timeout, etc.)?
- Whether you're seeing this in a specific environment (local, cloud, etc.)?
This would help us better understand and reproduce the issue you're experiencing.
One more thing i am trying to do this from past 1 hour so please provide more details
Imo the minimal reproduction scenario is here:
import { setTimeout } from "timers/promises";
import { Configuration, RequestQueueV2 } from '@crawlee/core';
import { Worker, isMainThread } from 'worker_threads';
async function initializeRq() {
const requestQueue = await RequestQueueV2.open(null);
await requestQueue.addRequests([
{ url: 'https://example.com/0' },
{ url: 'https://example.com/1' },
{ url: 'https://example.com/2' },
]);
}
async function main() {
const requestQueue = await RequestQueueV2.open(null, {
config: new Configuration({
purgeOnStart: false,
})
});
requestQueue.requestLockSecs = 1;
console.log(`[${isMainThread + 1}] ${(await requestQueue.fetchNextRequest())?.url} [${Date.now()}]`);
await setTimeout(1000);
console.log(`[${isMainThread + 1}] ${(await requestQueue.fetchNextRequest())?.url} [${Date.now()}]`);
await setTimeout(1000);
console.log(`[${isMainThread + 1}] ${(await requestQueue.fetchNextRequest())?.url} [${Date.now()}]`);
}
if (isMainThread) {
await initializeRq();
new Worker(new URL(import.meta.url));
await setTimeout(2000);
}
main();
The main function fetches three requests from the default RequestQueueV2 instance and prints the urls.
In the script, we run the main function twice - in the main thread and in the worker thread. The worker's lock on the first 25 requests elapses right when the main thread calls the first fetchNextRequest, which causes both to access the same requests at the same time (thinking they have them exclusively locked).
See the output of this script:
[1] https://example.com/0 [1743879224089]
[1] https://example.com/2 [1743879225094]
[2] https://example.com/1 [1743879225692]
[1] https://example.com/1 [1743879226106]
[2] undefined [1743879226697]
[2] https://example.com/0 [1743879227708]
Note that the worker thread ([1]) accessed the https://example.com/1 at 1743879226106, i.e. 414 milliseconds after the main thread ([2]) fetched it. This is violating the requestLockSecs argument we set for both RequestQueue instances.
hey @barjin thanks for guide i already have one PR open once that get done then i will make a PR to fix this issue
I ran into a bug in RequestQueue, that is causing some Requests to be "successfully" handled multiple times.
I am not sure if it is SDK or Apify Platform related it. It happens on Apify Platform with default RequestQueue (V2 with locking) without any extra settings, it doesn't happen with crawler option:
experiments: {
requestLocking: false,
},
When this option is set everything is working as expected.
- minimal repro repo https://github.com/JJetmar/big-bad-rq2
- Run: Here is run on Apify Platform.
Based on log, there were 8 requests handled, but by RQ there was only 5 of them made, some of them were handled multiple times.