Enqueue strategy check after redirects is not working with adaptive crawler
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
use enqueueLinks() without any parameters in the request handler on https://crawlee.dev/, at some point it will escape the domain and start scraping everything
https://console.apify.com/actors/PFaajt3k6oOp1YRAU/runs/0SfY5Ocr1dgQjhSIS#log
Code sample
import { PlaywrightCrawler } from 'crawlee';
import { Actor } from 'apify';
await Actor.init();
const crawler = new PlaywrightCrawler({
proxyConfiguration: await Actor.createProxyConfiguration(),
});
crawler.router.addDefaultHandler(async (ctx) => {
const $ = await ctx.parseWithCheerio();
const title = $('html title').text();
const h1 = $('body h1').text();
const proxy = ctx.proxyInfo?.username;
ctx.log.info(`processing ${ctx.request.url}`, { title, h1, proxy });
await ctx.pushData({ url: ctx.request.url, title, h1 });
await ctx.enqueueLinks();
});
await crawler.run(['https://crawlee.dev/']);
await Actor.exit();
Package version
3.10.3 beta
Node.js version
20
Operating system
No response
Apify platform
- [X] Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
No response
Other context
No response
Thanks for the report! Are you aware if there is a page that redirects elsewhere somewhere in the crawlee docs, or is the actual enqueueStrategy check failing (and not the post-redirect check)?
looking at the storage, it feels like its not about redirects, we have the edit this page links in there too
few more links here, i don't think they come from redirect either
it almost feels like the adaptive enqueueLinks is not checking the strategies at all, maybe its not about the post-redirect check at all