Feature Request: Don't force http/s for websites. Crawl4ai supports file:// for local html.

Open jchappo opened this issue 4 months ago • 3 comments

title

Aug 26 '25 14:08 jchappo

Makes sense! Added this to the board

Aug 26 '25 23:08 coleam00

I got this partially working by adding file:// to the supported schemes. I also needed to mount a local volume to the server Docker container where it can read local files.

I can get it to crawl a top level local file, but it returns local links without a file scheme and fails to crawl further.

I’m working on creating a small example with crawl4ai, to help figure out where to make the changes to the crawling_service.py.

Aug 26 '25 23:08 jchappo

significant use case: consider repos such as azure-docs

when a "docs" repo exists, you can git pull outside of archon, then crawl your local filesystem. and if the docs change, git pull local fs -> recrawl in archon.

Sep 04 '25 20:09 finnaGIT