crawlee
crawlee copied to clipboard
Multiple crawler instances share `useState` state
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/basic (BasicCrawler)
Issue description
When instantiating multiple crawler instances at once, their useState methods (both on the crawler instance and in the requestHandler context param) will always resolve to the same state.
From the API, this is not expected (crawler.useState feels like it should resolve to internal crawler state). If it is, it IMO requires better docs.
Code sample
import { CheerioCrawler } from '@crawlee/cheerio';
async function main() {
function createCrawler() {
return new CheerioCrawler({
requestHandler: async ({ request, useState }) => {
const state = await useState<string[]>([]);
state.push(request.url);
},
});
}
const [crawler1, crawler2] = [createCrawler(), createCrawler()];
await crawler1.run(['https://example.com']);
await crawler2.run(['https://example.org']);
console.log(crawler1 === crawler2); // false
console.log(await crawler1.useState() === await crawler2.useState()); // true
console.log(await crawler1.useState()); // ['https://example.com', 'https://example.org' ]
}
main();
Package version
3.13.8
Node.js version
Node 22
Operating system
Linux
Apify platform
- [ ] Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
Closed by #3309 in Crawlee v4