PlaywrightCrawler doesn't have gotoOptions
In the JavaScript version, the PuppeteerCrawler has gotoOptions, which I believe allows you to define what wait_until state you want.
https://crawlee.dev/js/api/puppeteer-crawler#PuppeteerGoToOptions
The PlaywrightCrawler just uses the default page.goto, which defaults to "load".
https://github.com/apify/crawlee-python/blob/9d4ae6439c301abe7439281a5786b8f166d67623/src/crawlee/crawlers/_playwright/_playwright_crawler.py#L300C1-L301C1
Some sites take ages to load and I would like my request_handler to run after "domcontentloaded", since I don't need to wait for the full page to load to get what I need. As it is now, my request_handler will never be called because the site has an issue preventing it from loading all of the way.
I don't just want to increase the timeout, I want to be able to specify what options _navigate should use when calling goto.
Hello @phughesion-h3 and thanks for using Crawlee for Python 🙂 In the JS version, the PuppeteerGoToOptions interface (or PlaywrightGoToOptions) is passed to the pre-navigation hooks which are allowed to modify it and thus configure how page.goto is going to behave.
As you wrote, this functionality is currently missing from the Python version - we will fix that.
One open question - configuring how page.goto is going to behave via modifying an argument to pre-navigation hooks is not optimal in terms of user discoverability - is there any better approach? @vdusek @Pijukatel @Mantisus
configuring how page.goto is going to behave via modifying an argument to pre-navigation hooks is not optimal in terms of user discoverability - is there any better approach?
Perhaps set it as one of the input parameters of PlaywrightCrawler and pass it to PlaywrightPreNavCrawlingContext so that the user can change the option parameters for specific URLs in pre_navigation_hook.
UPD: It is also likely that this should be done after completing this PR #1474. This is to ensure that the navigation options do not conflict with request_handler_timeout.