Playwright requires installation via `npx playwright install`
Which package is this bug report for? If unsure which one to select, leave blank
@crawlee/playwright (PlaywrightCrawler)
Issue description
When you install a project with the following package.json it fails on first start asking to npx install playwright.
It's not a great first experience to get a huge error on first run, so we should either:
- ensure that Playwright browsers are installed together with
@crawlee/playwrightor - document everywhere, most importantly on the Crawlee homepage, that this command needs to be run before Playwright can be started.
It's likely that to reproduce this, you first need to npx playwright uninstall to get into a "new user state".
This probably also impacts all our CLI templates.
Code sample
{
"name": "my-module",
"version": "0.0.1",
"dependencies": {
"crawlee": "^3.0.0",
"playwright": "*"
},
"type": "module",
"scripts": {
"start": "node main.js"
},
"author": "Me!"
}
Package version
3.7.1
Node.js version
v18.12.1
Operating system
MacOS
Apify platform
- [ ] Tick me if you encountered this issue on the Apify platform
I have tested this on the next release
no
Other context
No response
So you can reproduce this from some template, or by installing crawlee into an empty project? Because the templates are working fine on my end.
I believe the browsers are installed via postinstall hook nowadays, cc @vladfrangu
Yep, both apify and crawlee templates have a postinstall hook (that also ensures it won't run in our docker images, but will run everywhere else)
We should probably document the CLI command to users who are upgrading to newer playwright or are making new projects without our CLI. Could even just make a command in CLI to auto fix old projects (npx crawlee migrate-new-playwright?)
Hmm, probably not worth introducing a new command just to wrap an existing playwright command that's documented in the error.
So all the "default" and "new user" paths of installing crawlee are covered with this then? And I was just unlucky because I reinstalled an old project?
This is fixed for any users who create their project via apify create or crawlee create.. Otherwise, the postinstall hook needs to be added into the project (which is why I suggested making a cmd for it, to automate it for users)
Cant we have it on the @crawlee/playwright package?
Well...we install the package all the time, so running the command when people don't use playwright isn't ideal either... Not sure what the best solution is
Hmm but in the end, we want this to work with the crawlee package too, same for puppeteer. The browsers used to be installed before too, right?
Can we have some env var to skip the downloads in the postinstall script? I'd probably just install them all the time and allow opting out, that was the previous behavior before all this mess happened.
I am getting the dreaded:
╔═════════════════════════════════════════════════════════════════════════╗
║ Looks like Playwright Test or Playwright was just installed or updated. ║
║ Please run the following command to download new browsers: ║
║ ║
║ npx playwright install ║
║ ║
║ <3 Playwright Team ║
╚═════════════════════════════════════════════════════════════════════════╝
With the only code change being adding a new express route. I also defined 2 request queues following some internet skim reading. Running locally everything works as expected, but this issue is occuring via GCP Cloud Run.
I am using apify/actor-node-playwright-chrome:18 in my Dockerfile.
My logs show this error:
browserType.launchPersistentContext: Executable doesn't exist at /home/myuser/pw-browsers/chromium-1091/chrome-linux/chrome
and having pulled down the image and running locally via docker I can confirm that the only browsers present in pw-browsers are the following:
# cd pw-browsers
# ls
chrome chromium-1097 ffmpeg-1009
New route:
app.get("/lemon", async (req, res, message) => {
const targetLink = req.query.link;
if (!targetLink) {
throw new Error('The link query parameter is required in order to know which lemon to crawl.');
}
const startUrl = `${targetLink}`;
console.log(`We've received the lemon to crawl as: ${startUrl}`);
const crawler = new PlaywrightCrawler(
{
requestHandler: router.getHandler('TANGY_LEMON'),
minConcurrency: 5,
requestQueue: lemonRequestQueue
},
new Configuration({
persistStorage: false,
})
);
await crawler.run([startUrl]);
const crawlerOutput = await crawler.getData();
return res.send(crawlerOutput);
});
Any advice on how to resolve or if this is unrelated would be amazing. Before I had simply followed the documentation instructions with a top-level express.js route. I am using a specific handler needed only for lemon as the top-level route is scraping a more broad tree of pages where the final outcome is lemon but I need to be able to request a specific crawl of a lemon using my route. Also please don't bully the choice of a query param here, quick & lazy was the thought.
Sounds like your playwright version doesn't match the one we use when building images. You should specify it in the image version tag (so you'd have apify/actor-node-playwright-chrome:18-1.40.0 for playwright 1.40.0 as an example! That should solve the issue, but please follow up if it doesn't
In fairness, I was using a wildcard for the playwright version in my package.json - I fixed it to ^1.40.0 as is the case for @playwright/test and still not resolved :(
Error message is the same as above regarding missing browser chromium-1091
If you use a range like that it'll still install the latest version that matches, you'd need to either use ~ for the range or a fixed version 😅
If you're able to make a reproducible sample in a repository that'd help a bunch too!
I will move to relevant thread as this relates to, having narrowed down the problem to the Dockerfile or atleast this element of my pipeline.
Confirmed by simple rebuilding and redeploying an unchanged project (i.e expected to be the equivalent to a rollback) and still getting the same error around the lack of that specific browser chromium-1091. I will now try pinning version of playwright or using latest apify docker image or both (please don't make me create and serve my own base image... such overkill suggested in above thread by other user).
I am getting same error
2024-06-30T20:00:31.110Z Error occurred browserType.launch: Executable doesn't exist at /home/myuser/pw-browsers/chromium-1117/chrome-linux/chrome
2024-06-30T20:00:31.112Z ╔═════════════════════════════════════════════════════════════════════════╗
2024-06-30T20:00:31.114Z ║ Looks like Playwright Test or Playwright was just installed or updated. ║
2024-06-30T20:00:31.115Z ║ Please run the following command to download new browsers: ║
2024-06-30T20:00:31.117Z ║ ║
2024-06-30T20:00:31.120Z ║ npx playwright install ║
2024-06-30T20:00:31.121Z ║ ║
2024-06-30T20:00:31.123Z ║ <3 Playwright Team ║
2024-06-30T20:00:31.125Z ╚═════════════════════════════════════════════════════════════════════════╝
2024-06-30T20:00:31.126Z at scheduleLadder (/home/myuser/dist/main.js:295:34)
2024-06-30T20:00:31.128Z at main (/home/myuser/dist/main.js:375:26)
2024-06-30T20:00:31.130Z at /home/myuser/async file:/home/myuser/dist/main.js:378:1 {
2024-06-30T20:00:31.133Z name: 'Error'
2024-06-30T20:00:31.135Z }