playwright-aws-lambda Download event not caught and always times out

Download event not caught and always times out

Open imhashir opened this issue 3 years ago • 12 comments

Thanks a bunch for creating this awesome package.

I was having an issue with the download event. It works great when I try to execute my code locally (serverless invoke local) but when I deploy this via serverless deploy, the waitForEvent('download') times out.

Here's the code:

  const { page, browser } = await openWebpage(URL);

  const [download] = await Promise.all([
    // Start waiting for the download
    page.waitForEvent('download'),
    // Perform the action that initiates download
    page.click(`#${BTN_ID}`),
  ]);

Here's the openWebpage function:

export async function openWebpage(url) {
  const browser = await playwright.launchChromium();
  const context = await browser.newContext({
    acceptDownloads: true,
  });

  const page = await context.newPage();
  await page.goto(url);

  return { page, browser: context };
}

A similar issue was posted in playwright's official repo here. In that same issue, I've commented about my issue as well, here.

I guess since this package was created based on chrome-aws-lambda, which is for puppeteer basically, and puppeteer does not support download event, so it wasn't included in this package as well. But that's just a random guess. I'd love to help in any way to get this issue fixed.

Hope to hear from you soon.

Jan 23 '21 21:01 imhashir

I am also seeing same timeout error when running in lambda.

TimeoutError: Timeout while waiting for event "download" Note: use DEBUG=pw:api environment variable and rerun to capture Playwright logs.

"playwright-aws-lambda": "^0.6.0", "playwright-core": "^1.8.0",

Jan 23 '21 23:01 Madhu1512

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

Jan 24 '21 06:01 Madhu1512

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

Amazing. Are you doing it via Serverless or bare lambda? Can you guide me through the process or share some code snippet? Thank You.

Jan 24 '21 06:01 imhashir

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

Wow, I am also interested. Can you please guide us through the process or share some code snippet? Thank You very much and have nice day.

Jan 24 '21 08:01 osmenia

I feel the issue is more with the packaged version of chrome with this library. For now, I switched to the new docker container functionality with lambda and able to process the downloads without any issue.

Awesome, it would be great if you could guide us here.

Thanking you in anticipation.

Jan 24 '21 22:01 anupsunni

Here is the example I put together for the playwright running in a lambda docker container.

https://github.com/Madhu1512/playwright-lambda-demo

Jan 25 '21 21:01 Madhu1512

Thanks a lot @Madhu1512 for going through the effort of creating an example for us. I'll have to look into docker based lambda deployments to get that to work but I'll definitely try your solution. For now, I could get it to work by downgrading playwright-core to 1.0.2 as suggested by @osmenia in https://github.com/microsoft/playwright/issues/3726#issuecomment-767374664

Jan 26 '21 22:01 imhashir

@austinkelleher

can you pls update chromium see https://github.com/microsoft/playwright/issues/3726#issuecomment-767254216

Jan 29 '21 19:01 osmenia

Hi All !

any news ? i'm stucked with this error...

aws lambda of course, nodejs > 16 runtime

Here's my package.json :

"dependencies": { "playwright-aws-lambda": "^0.9.0", "playwright-core": "^1.26.0" }

i've tried to downgrade Playwright-core to the suggested 1.2.0 but then i need to refactor all the code since the locator not exist in such old version...

Any suggestion ? note that i've also tried to "manually" dispatch the click event but without success

what i need to achieve is to save the downloaded file to /tmp/ so i cant parse it ( is a Csv) later on.

finally, there's the code ( locally works flawless)

` const playwright = require('playwright-aws-lambda')

const extractData = async () => { const browser = await playwright.launchChromium() const cxt = await browser.newContext()

const page = await cxt.newPage()


await page.goto('https:/<TheTargetSite>/auth/login');
]
await page.locator('input[type="email"]').click();

await page.locator('input[type="email"]').fill('[email protected]');

await page.locator('input[type="password"]').click();

await page.locator('input[type="password"]').fill('YYYYY');

await page.locator('button:has-text("Log in")').click();

await page.locator('a:has-text("Rides")').click();

await page.locator('text=ActiveStatus').click();

await page.locator('text=Ended').click();

await page.locator('[placeholder="Start date"]').click();

await page.locator('[aria-label="September 19\\, 2022"]').click();

await page.locator('[aria-label="September 19\\, 2022"]').click();

await page.locator('[aria-label="Export rides"]').click();

const [download] = await Promise.all([
	page.waitForEvent('download'),
	page.locator('button:has-text("Export")').click()
]);
await download.saveAs('/tmp/rides.csv')
await page.close()
await cxt.close()
await browser.close()

}

module.exports = { extractData } `

Sep 27 '22 14:09 CRSylar

I was able to work around this issue by fixing a /tmp folder for the chrome to output its temporary files, then watch the directory for the PDF to arrive.

Obviously not perfect for every situation, but works well for us when the PDF download is reliable and only will be one download per session.

For example:

   const tmpFolder = "/tmp/pdfs/" + uuid();
   const browser = await playwright.launchChromium({downloadsPath: tmpFolder});
   const context = await browser.newContext();
   const page = await context.newPage();
 
  ...
        
   await page.getByText("Download PDF").click()
   let pdfFiles: string[] = [];

   while(!pdfFiles.length) {
        await page.waitForTimeout(1000);
        pdfFiles = fs.readdirSync(tmpFolder);
   }

   const pdfData = fs.readFileSync(`${tmpFolder}/${pdfFiles[0]}`);

Feb 09 '23 18:02 SamLoy

I was able to work around this issue by fixing a /tmp folder for the chrome to output its temporary files, then watch the directory for the PDF to arrive.

Obviously not perfect for every situation, but works well for us when the PDF download is reliable and only will be one download per session.

For example:
   const tmpFolder = "/tmp/pdfs/" + uuid();
   const browser = await playwright.launchChromium({downloadsPath: tmpFolder});
   const context = await browser.newContext();
   const page = await context.newPage();
 
  ...
        
   await page.getByText("Download PDF").click()
   let pdfFiles: string[] = [];

   while(!pdfFiles.length) {
        await page.waitForTimeout(1000);
        pdfFiles = fs.readdirSync(tmpFolder);
   }

   const pdfData = fs.readFileSync(`${tmpFolder}/${pdfFiles[0]}`);
       

Hi Samloy,

I have use tmp folder in the past for other thing in lambda functions to store temperature files. In this case do you have to pre create the folder every run or added to your source code in AWS Lambda? Or it was enough just with this snippet above?

Many thanks for your suggestion and help.

Regards Dan

Jul 28 '23 05:07 TheAPIguys

@SamLoy Very good idea, I successfully ran playwright-aws-lambda in vercel and then downloaded the file

Dec 18 '23 02:12 zhw2590582

playwright-aws-lambda playwright-aws-lambda copied to clipboard

Download event not caught and always times out

playwright-aws-lambda
playwright-aws-lambda copied to clipboard