[Bug]: Forward slashes of `raw://` are not removed when converting raw URLs to HTML

Open jl-martins opened this issue 10 months ago • 0 comments

crawl4ai version

0.4.248

Expected Behavior

The "raw://" prefix is completely removed from raw HTML

Current Behavior

The "//" part is not removed

Is this reproducible?

Yes

Inputs Causing the Bug

Any URL that starts with "raw://"

Steps to Reproduce

Run the following script:

import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig

async def main():
    async with AsyncWebCrawler() as crawler:
        result = await crawler.arun(
            url=f"raw://<html><body><h1>Heading</h1></body></html>",
            config=CrawlerRunConfig(),
        )
        print(result.markdown)

if __name__ == "__main__":
    asyncio.run(main())

OS

All

Python version

All

Browser

All

Browser version

All

Feb 15 '25 15:02 jl-martins