crawl4ai
crawl4ai copied to clipboard
[Bug]: Forward slashes of `raw://` are not removed when converting raw URLs to HTML
crawl4ai version
0.4.248
Expected Behavior
The "raw://" prefix is completely removed from raw HTML
Current Behavior
The "//" part is not removed
Is this reproducible?
Yes
Inputs Causing the Bug
Any URL that starts with "raw://"
Steps to Reproduce
Run the following script:
import asyncio
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
async def main():
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url=f"raw://<html><body><h1>Heading</h1></body></html>",
config=CrawlerRunConfig(),
)
print(result.markdown)
if __name__ == "__main__":
asyncio.run(main())
OS
All
Python version
All
Browser
All
Browser version
All