crawl4ai
crawl4ai copied to clipboard
[Bug]: scan_full_page doesn't scan full page
crawl4ai version
0.6.0
Expected Behavior
I run cli command
crwl "https://htreviews.org/tobaccos/darkside" -o md -c scan_full_page=True,scroll_delay=0.5,delay_before_return_html=2
I expect it to scan full page
Current Behavior
It doesn't scan full page
Is this reproducible?
Yes
Inputs Causing the Bug
URL: https://htreviews.org/tobaccos/darkside
This page dynamically loading content on scroll and it has kinda weird scroll handler that doesn't work if you just scroll in the end, you have to scroll up and down for it to load everything. It scrapes only first load with two loads on scroll, but it's not the full page.
I've played with scroll_delay and delay_before_return_html, it doesn't change. I suppose there is either some limit in the library or maybe the problem is with this website way of handling scroll.
Steps to Reproduce
Run `crwl "https://htreviews.org/tobaccos/darkside" -o md -c scan_full_page=True,scroll_delay=0.5,delay_before_return_html=2` in cli
Code snippets
OS
Windows
Python version
3.12
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
No response