crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: result.link (links extraction results empty lists) not working when using raw_html_url = f"raw:{raw_html}" as input

Open mllife opened this issue 10 months ago • 2 comments

crawl4ai version

0.4.247

Expected Behavior

page should have some links to other pages, which it should return in result.link

Current Behavior

all_links = result.links.get("internal", []) + result.links.get("external", []) # always empty

when doing

raw_html_url = f"raw:{raw_html}"

async with C4AIAsyncWebCrawler(config=browser_config) as crawler: result = await crawler.arun(raw_html_url, config=crawler_config)

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce


Code snippets


OS

macOS

Python version

3.11.9

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

mllife avatar Feb 13 '25 13:02 mllife

@mllife Can you give some samples for raw_html that's causing the crawler to return empty links in the result.

aravindkarnam avatar Feb 14 '25 11:02 aravindkarnam

all of them, i generated the html using crawler for this page "https://www.bankofcanada.ca/press/"; if i read it from drive again i am getting empty result.links

mllife avatar Feb 14 '25 13:02 mllife

it's fixed and in the latest release (0.7.4)

ntohidi avatar Aug 18 '25 08:08 ntohidi