crawl4ai [Bug]: scan_full_page doesn't scan full page

[Bug]: scan_full_page doesn't scan full page

Open justicecurcian opened this issue 7 months ago • 0 comments

crawl4ai version

0.6.0

Expected Behavior

I run cli command

crwl "https://htreviews.org/tobaccos/darkside" -o md -c scan_full_page=True,scroll_delay=0.5,delay_before_return_html=2

I expect it to scan full page

Current Behavior

It doesn't scan full page

Is this reproducible?

Yes

Inputs Causing the Bug

URL: https://htreviews.org/tobaccos/darkside

This page dynamically loading content on scroll and it has kinda weird scroll handler that doesn't work if you just scroll in the end, you have to scroll up and down for it to load everything. It scrapes only first load with two loads on scroll, but it's not the full page.

I've played with scroll_delay and delay_before_return_html, it doesn't change. I suppose there is either some limit in the library or maybe the problem is with this website way of handling scroll.

Steps to Reproduce

Run `crwl "https://htreviews.org/tobaccos/darkside" -o md -c scan_full_page=True,scroll_delay=0.5,delay_before_return_html=2` in cli

Code snippets

OS

Windows

Python version

3.12

Browser

Chrome

Browser version

No response

Error logs & Screenshots (if applicable)

No response

May 01 '25 20:05 justicecurcian

crawl4ai crawl4ai copied to clipboard

[Bug]: scan_full_page doesn't scan full page

crawl4ai version

Expected Behavior

Current Behavior

Is this reproducible?

Inputs Causing the Bug

Steps to Reproduce

Code snippets

OS

Python version

Browser

Browser version

Error logs & Screenshots (if applicable)

crawl4ai
crawl4ai copied to clipboard