crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

Incorrect scraped content (another page's content is scraped)

Open jtha opened this issue 3 months ago • 1 comments

I noticed some strange behaviour when I was doing retrieval and it turns out I'm seeing wrong page content for the url provided. I have replicated this a few times and so far it looks like it's triggered when setting magic=True. My sense is simulating user behaviour might be resulting in inadvertently clicking on a link on the page?

Turning this off and enabling the protection methods except for simulate_user=True seems to make it behave as intended, at least as far as I can see. For reference this was happening on Weaviate's documentation page with many links on the nav bar, side bar, main content area, basically links everywhere.

jtha avatar Nov 15 '24 11:11 jtha