crawlee icon indicating copy to clipboard operation
crawlee copied to clipboard

Add an utility function to take screenshots from Puppeteer larger than 16k pixels

Open jancurn opened this issue 6 years ago • 6 comments

Basically, Puppeteer can only take screenshots with the width or height at most 16,834px (this is hard-coded Chrome limit, see https://github.com/GoogleChrome/puppeteer/issues/359). However, for one customer project, we need screenshots of size up to 50,000px. This can be implemented by taking multiple screenshots and stitching them together (e.g. see example at https://github.com/GoogleChrome/puppeteer/pull/937/files), even though this approach might not work precisely all the time (e.g. the page content might move during scrolling down).

We could add this as a generic function to Apify SDK, e.g. under Apify.utils.puppeteer.captureScreenshot()

jancurn avatar Jul 09 '19 13:07 jancurn

Do we really need such an edge case in the SDK though?

mnmkng avatar Jul 09 '19 14:07 mnmkng

Well, I think taking screenshots is a super common use case in web scraping, and many pages (especially on mobile) are extremely long (more than 16k pixels), so it might make sense to have this in Apify SDK. I expect the function will be simple enough, the dependencies can be optional. But perhaps you're right and we could add functions like this into separate modules...

jancurn avatar Jul 09 '19 14:07 jancurn

We already wanted to have a generic saveSnapshot function as it is in Web Scraper and so. So if this would not require additional dependencies, I would put it in.

metalwarrior665 avatar Jul 09 '19 14:07 metalwarrior665

May i take a look at this? I am pretty new to Open Source Contribution and would love to try and implement this.

robintom avatar Oct 04 '19 05:10 robintom

Sure @robintom, go ahead. Try to first prepare some proof of concept with plain Puppeteer and we can guide you through integration into the SDK later.

Thanks and good luck!

mnmkng avatar Oct 04 '19 14:10 mnmkng

For inspiration, you can have a look at the code below. However, for inclusion in Apify SDK, it would need a bit more polishing, unit tests, simpler interface, and documentation. Unfortunately, the code needs an additional dependency `merge-img, so we should discuss whether it's okay to include it or maybe just keep it as optional or peer dependency?

async function captureFullScreenshot({page, request, path, maxWidth = 50000, maxHeight = 50000, label = 'CAPTURE-FULL-SCREENSHOT'}) {
    function getBufferPromise(jimp) {
        return new Promise(((resolve, reject) => {
            jimp.getBuffer("image/png", (error, img) => {
                // console.log(`Buffer result for `, jimp, '\n',img);
                if (img) {
                    resolve(img)
                } else {
                    reject(error)
                }
            })
        }));
    }

    const MAX_TEXTURE_SIZE = 16 * 1024;

    const client = await page.target().createCDPSession();
    const metrics = await client.send('Page.getLayoutMetrics');
    const pageWidth = Math.ceil(metrics.contentSize.width);
    const pageHeight = Math.ceil(metrics.contentSize.height);
    if (maxWidth > pageWidth) {
        log(request, `${label}: maxWidth ${maxWidth} is bigger then the page width ${pageWidth} we will use page width instead`);
        maxWidth = pageWidth
    }
    if (maxHeight > pageHeight) {
        log(request, `${label}: maxHeight ${maxHeight} is bigger then the page height ${pageHeight} we will use page height instead`);
        maxHeight = pageHeight
    }
    // if (maxHeight <= MAX_TEXTURE_SIZE && maxWidth <= MAX_TEXTURE_SIZE) {
    //     return await page.screenshot({fullPage: true})
    // }

    const dpr = page.viewport() ? page.viewport().deviceScaleFactor || 1 : 1;

    await client.send('Emulation.setDeviceMetricsOverride', {
        mobile: false,
        width: maxWidth,
        height: maxHeight,
        deviceScaleFactor: dpr,
        screenOrientation: {angle: 0, type: 'portraitPrimary'}
    });
    // Hardcoded max texture size of 16,384 (crbug.com/770769)
    // Will divide that so the screenshot captured correctly
    const maxScreenshotHeight = Math.floor(MAX_TEXTURE_SIZE / dpr) / 4;
    const maxScreenshotWidth = Math.floor(MAX_TEXTURE_SIZE / dpr) / 4;
    log(request, `${label}: pageWidth=${pageWidth}, pageHeight=${pageHeight}, maxWidth=${maxWidth}, maxHeight=${maxHeight}, maxScreenshotHeight=${maxScreenshotHeight}, maxScreenshotWidth=${maxScreenshotWidth}, dpr=${dpr}`);
    const buffersRows = [];
    for (let ypos = 0; ypos < maxHeight; ypos += maxScreenshotHeight) {
        const height = Math.min(maxHeight - ypos, maxScreenshotHeight);
        const buffersRow = [];
        for (let xpos = 0; xpos < maxWidth; xpos += maxScreenshotWidth) {
            const width = Math.min(maxWidth - xpos, maxScreenshotWidth);
            const buffer = await page.screenshot({
                clip: {
                    x: xpos,
                    y: ypos,
                    width,
                    height
                }
            });
            buffersRow.push({buffer, xpos, ypos})
        }
        buffersRows.push(buffersRow);
    }
    const mergeImg = require('merge-img');
    const mergedBuffersRows = [];
    for (const buffersRow of buffersRows) {
        if (buffersRow.length > 1) {
            const mergedBuffersRow = await mergeImg(buffersRow.map((buf) => buf.buffer))
                .then(getBufferPromise);
            mergedBuffersRows.push(mergedBuffersRow)
        } else {
            mergedBuffersRows.push(buffersRow[0].buffer)
        }
    }
    let ssBuffer;
    if (mergedBuffersRows.length > 1) {
        ssBuffer = await mergeImg(mergedBuffersRows, {direction: true})
            .then(getBufferPromise);
    } else {
        ssBuffer = mergedBuffersRows[0]
    }
    if (path) await writeFileAsync(path, ssBuffer);
    return ssBuffer;
}

jancurn avatar Oct 04 '19 15:10 jancurn

Closing as stale, the linked issue in puppeteer is already closed too.

B4nan avatar Jul 17 '23 15:07 B4nan