headless-chrome-crawler icon indicating copy to clipboard operation
headless-chrome-crawler copied to clipboard

Is there a way to scroll?

Open wemow opened this issue 4 years ago • 4 comments

What is the current behavior? No documented way of scrolling

What is the expected behavior? Being able to scroll

What is the motivation / use case for changing the behavior? Being able to scroll dynamically loaded content by scrolling

wemow avatar Jan 13 '21 04:01 wemow

Sorry, @yemd, I cannot reach this library maintainer to get access to publishing updates. I'd recommend building a custom solution using puppeteer instead of using this library.

kulikalov avatar Jan 13 '21 08:01 kulikalov

Here is how I kept scrolling through a list for lazy loaded products untill the crawler reached the bottom of the page. I hope this helps :)

const productCrawler = await Crawler.launch({
  /*... */
});

await productCrawler.queue({
  url: '...',
  retryCount: 1,
  maxDepth: 3,
  depthPriority: false,
  waitUntil: 'networkidle0',
  jQuery: false,
  waitFor: {
    options: {},
    args: [config], // args for selectorOrFunctionOrTimeout
    selectorOrFunctionOrTimeout: function (config) {
      const documentHeight = document.documentElement.scrollHeight;

      window.scrollTo(0, documentHeight);

      // You might want to check if there are any elements still loading (look for spinners, other indicators, or just wait)
      // Return true if you are done scrolling, false otherwise

      return true; 
    },
  },
});

await productCrawler.onIdle();
await productCrawler.close();

If not you can always scroll inside the evaluatePageMethod

const productCrawler = await Crawler.launch({
  // ...
  evaluatePage: eval(`() => {
    const documentHeight = document.documentElement.scrollHeight;

    window.scrollTo(0, documentHeight);
  }`),
  // ...
})

ThisNameWasTaken avatar Feb 12 '21 21:02 ThisNameWasTaken

Take a look at get-set-fetch infinite scrolling example. It may prove a viable alternative. Disclaimer: I'm the repo owner.

a1sabau avatar Feb 20 '22 21:02 a1sabau

worked for me like that:

        customCrawl: async (page, crawl) => {
            await page.setViewport({
                width: 1200,
                height: 800
            });
            const result = await crawl();

            await page.evaluate(scrollToBottom);
            await page.waitFor(3000);
            return result;
        },
...
async function scrollToBottom() {
    await new Promise(resolve => {
        const distance = 100; // should be less than or equal to window.innerHeight
        const delay = 100;
        const timer = setInterval(() => {
            document.scrollingElement.scrollBy(0, distance);
            if (document.scrollingElement.scrollTop + window.innerHeight >= document.scrollingElement.scrollHeight) {
                clearInterval(timer);
                resolve();
            }
        }, delay);
    });
}

michaelpapesch avatar Feb 21 '22 15:02 michaelpapesch