crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: scan_full_page doesn't scan full page

Open justicecurcian opened this issue 7 months ago • 0 comments

crawl4ai version

0.6.0

Expected Behavior

I run cli command

crwl "https://htreviews.org/tobaccos/darkside" -o md -c scan_full_page=True,scroll_delay=0.5,delay_before_return_html=2

I expect it to scan full page

Current Behavior

It doesn't scan full page

Is this reproducible?

Yes

Inputs Causing the Bug

URL: https://htreviews.org/tobaccos/darkside

This page dynamically loading content on scroll and it has kinda weird scroll handler that doesn't work if you just scroll in the end, you have to scroll up and down for it to load everything. It scrapes only first load with two loads on scroll, but it's not the full page.

I've played with scroll_delay and delay_before_return_html, it doesn't change. I suppose there is either some limit in the library or maybe the problem is with this website way of handling scroll.

Steps to Reproduce

Run `crwl "https://htreviews.org/tobaccos/darkside" -o md -c scan_full_page=True,scroll_delay=0.5,delay_before_return_html=2` in cli

Code snippets


OS

Windows

Python version

3.12

Browser

Chrome

Browser version

No response

Error logs & Screenshots (if applicable)

No response

justicecurcian avatar May 01 '25 20:05 justicecurcian