NotImplementedError in asyncio.create_subprocess_exec on Windows
Issues with crawl4ai Library
1. NotImplementedError in asyncio.create_subprocess_exec on Windows
Description
The crawl4ai library uses asyncio.create_subprocess_exec to start the Playwright browser, which is not supported on Windows. This results in a NotImplementedError, preventing the AsyncWebCrawler from functioning correctly on Windows platforms.
Steps to Reproduce
- Run the code on a Windows platform.
- Use the
AsyncWebCrawlerto crawl a URL. - Observe the
NotImplementedErrorwhen starting the Playwright browser.
Expected Behavior
The AsyncWebCrawler should start the Playwright browser and crawl the URL without raising a NotImplementedError.
Actual Behavior
The code raises a NotImplementedError when attempting to start the Playwright browser using asyncio.create_subprocess_exec.
@KaifAhmad1 thanks for sharing this. I will check on the Windows machine and I will update you in the meantime. If you share with me the code snippet that you use, I will do better because I will try to just see if there is anything missed in the way that you call the crowd for AI or not. Please share with me that one as well.
@unclecode any updates on this issue as I'm facing the similar issue myself. The behaviour is slightly confusing to me. When I just used it as a script to test it works fine and crawls as expected.
But ,then for my use case I tried using it with fastapi in my post route and it started throwing the same NotNotImplementedError to me.
I'm using a windows10 machine crawl4ai version 0.3.71 (As the latest version does not works even for the script this looked more stable as latest version was throwing some playwright related errors and through github issues i found this version seems stable)
@Aniket1026 Hello, thank you for using Crawl for AI! Could you please share your exact code or snippet with me?
When I tried it on my Windows machine, I didn’t encounter this issue, so I’d like to investigate further. Please also share the specs of your operating system, Python version, and any other details you think might help. Once I have that, I’ll try to replicate the error on my end.
This is how the script which I used to test looks like and it works fine with crawl4ai==0.3.71 version. As with latest version 0.4.0 it does not even works . You can test this script using any url of an amazon product. command to run : python filename
import asyncio
import json
from crawl4ai import AsyncWebCrawler
from crawl4ai.extraction_strategy import ExtractionStrategy
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
product_schema = {
"name": "Product Details",
"baseSelector": "div#centerCol",
"fields": [
{
"name": "title",
"selector": "span#productTitle",
"type": "text",
}
],
}
class UnableToCrawlProductError(Exception):
pass
async def extract(url: str, extraction_strategy: ExtractionStrategy) -> list[dict]:
async with AsyncWebCrawler(verbose=True,headless=True) as crawler:
result = await crawler.arun(
url=url,
extraction_strategy=extraction_strategy,
bypass_cache=True,
verbose=False,
)
assert result.success, "Failed to crawl the page"
return json.loads(result.extracted_content)
url = input("Enter the product URL: ")
try:
details: list = asyncio.run(
extract(
url=url,
extraction_strategy=JsonCssExtractionStrategy(schema=product_schema),
)
)
print("My product Detail : ", details)
except UnableToCrawlProductError as e:
print(e)
exit(1)
Then for my use case I tried using it with fastapi in my post route which looks like below: You can start the uvicorn server and send a post request using raw json with "url" as key and any amazon product url as value; command to start server : uvicorn filename:app --reload
from fastapi import FastAPI
from fastapi import HTTPException
from pydantic import BaseModel, HttpUrl
from crawl4ai import AsyncWebCrawler
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
from crawl4ai.extraction_strategy import ExtractionStrategy
import json
product_schema = {
"name": "Product Details",
"baseSelector": "div#centerCol",
"fields": [
{
"name": "title",
"selector": "span#productTitle",
"type": "text",
}
],
}
app = FastAPI()
class ProductUrl(BaseModel):
url: HttpUrl
async def extract(url: str, extraction_strategy: ExtractionStrategy) -> list[dict]:
async with AsyncWebCrawler(verbose=True, headless=True) as crawler:
result = await crawler.arun(
url=url,
extraction_strategy=extraction_strategy,
bypass_cache=True,
verbose=False,
)
assert result.success, "Failed to crawl the page"
return json.loads(result.extracted_content)
@app.get("/")
async def root():
return {"message": "Hello test"}
@app.post("/get-details")
async def compare_product(product_url: ProductUrl):
url = product_url.url
try:
product_info = await extract(
url=url,
extraction_strategy=JsonCssExtractionStrategy(product_schema),
)
return {"product_info": product_info}
except Exception as e:
raise HTTPException(status_code=400, detail=str(e))
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000, reload=True)
With this code i get the error when i hit the post request via postman which is below
Task exception was never retrieved
future: <Task finished name='Task-5' coro=<Connection.run() done, defined at C:\....\product-compare\backend\venv\lib\site-packages\playwright\_impl\_connection.py:265> exception=NotImplementedError()>
Traceback (most recent call last):
File "C:\.....\backend\venv\lib\site-packages\playwright\_impl\_connection.py", line 272, in run
await self._transport.connect()
File "C:\.....\backend\venv\lib\site-packages\playwright\_impl\_transport.py", line 133, in connect
raise exc
File "C:\...\backend\venv\lib\site-packages\playwright\_impl\_transport.py", line 120, in connect
self._proc = await asyncio.create_subprocess_exec(
File "C:\.s\Python\Python310\lib\asyncio\subprocess.py", line 218, in create_subprocess_exec
transport, protocol = await loop.subprocess_exec(
File "C:\...\Python\Python310\lib\asyncio\base_events.py", line 1667, in subprocess_exec
transport = await self._make_subprocess_transport(
File "C:\....\Python\Python310\lib\asyncio\base_events.py", line 498, in _make_subprocess_transport
raise NotImplementedError
NotImplementedError
my dependencies looks like this
fastapi==0.115.6
requests==2.32.3
crawl4ai==0.3.71
validators==0.34.0
uvicorn==0.32.1
OS - windows10 machine python - 3.10.4 pip - 22.0.4
If there's something more i could provide you, please let me know if there's something I could help you with . I hope this will help you generate the same error
Hi @Aniket1026, cc @unclecode
I couldn’t reproduce the exact behavior you mentioned, but when following your approach, I encountered the following error:
error: Page.goto: Object of type HttpUrl is not JSON serializable
This issue seems to be resolved by modifying the code as shown below:
class ProductUrl(BaseModel):
url: str
You can handle URL validations afterward to ensure correctness.
My Environment:
- OS: macOS Sequoia (15.1.1)
This might be more of a Windows-specific issue, though. If someone using Windows could confirm, that would be helpful! 🙇🏼
Thanks!
Hi @hitesh22rana Thank you for trying it out and coming up with a suggestion. But even with the suggested changes I'm still encountering the same issue. I believe you have a different OS and this could be the reason that you don't see the same behaviour.
@Aniket1026,
This seems to be related to an issue with uvicorn. I found the following reference that might be helpful: GitHub Issue #964
The error occurs because FastAPI uses uvloop, and asyncio doesn’t automatically recognize this without explicitly setting a policy. There’s a helpful answer that outlines hooks to achieve this: StackOverflow thread
Please try to set the following in your code:
import asyncio
import uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
Hope this helps! 🙇🏼
@Aniket1026 Regarding the first part, your script works well on my side and this is the output:
[INIT].... → Crawl4AI 0.4.1
[FETCH]... ↓ https://www.amazon.com/Sapiens-Humankind-Yuval-Noa... | Status: True | Time: 15.55s
[SCRAPE].. â—† Processed https://www.amazon.com/Sapiens-Humankind-Yuval-Noa... | Time: 714ms
[EXTRACT]. â– Completed for https://www.amazon.com/Sapiens-Humankind-Yuval-Noa... | Time: 0.5135177089832723s
[COMPLETE] â—Ź https://www.amazon.com/Sapiens-Humankind-Yuval-Noa... | Status: True | Total: 16.78s
My product Detail : [{'title': 'Sapiens: A Brief History of Humankind'}]
For the FastAPI server, remember that the url you pass to the arun() function should be a string, and you are passing HttpUrl. Following is the full code, after a few modifications that work fine on my machine; please try it and let me know. @hitesh22rana thx for support.
import os, sys
from fastapi import FastAPI
from fastapi import HTTPException
from pydantic import BaseModel, HttpUrl
from crawl4ai import AsyncWebCrawler, CacheMode
from crawl4ai.extraction_strategy import JsonCssExtractionStrategy
from crawl4ai.extraction_strategy import ExtractionStrategy
import json
product_schema = {
"name": "Product Details",
"baseSelector": "div#centerCol",
"fields": [
{
"name": "title",
"selector": "span#productTitle",
"type": "text",
}
],
}
class UnableToCrawlProductError(Exception):
pass
async def extract(url: str, extraction_strategy: ExtractionStrategy) -> list[dict]:
async with AsyncWebCrawler(verbose=True,headless=True) as crawler:
result = await crawler.arun(
url=url,
extraction_strategy=extraction_strategy,
cache_mode=CacheMode.BYPASS,
)
assert result.success, "Failed to crawl the page"
return json.loads(result.extracted_content)
# url = input("Enter the product URL: ")
# url = "https://www.amazon.com/Sapiens-Humankind-Yuval-Noah-Harari/dp/0062316095"
app = FastAPI()
class ProductUrl(BaseModel):
url: HttpUrl
@app.get("/")
async def root():
return {"message": "Hello test"}
@app.post("/get-details")
async def compare_product(product_url: ProductUrl):
url = product_url.url
try:
product_info = await extract(
url=str(url),
extraction_strategy=JsonCssExtractionStrategy(product_schema),
)
return {"product_info": product_info}
except Exception as e:
raise HTTPException(status_code=400, detail=str(e))
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8081)
I tested like this:
$ curl -X POST "http://localhost:8081/get-details" \
-H "Content-Type: application/json" \
-d '{"url": "https://www.amazon.com/Sapiens-Humankind-Yuval-Noah-Harari/dp/0062316095"}'
In 0.4.1, I am adding this step to raise and error if url is not string, I saw similar issue with another case, so better to make it explicit.
@unclecode Thanks for looking in the issue. I tried your given solution and still the same NotImplementedError error occurs. I tried with both crawl4ai version 0.4.0 and 0.3.71( as it seemed a bit more stable when testing the script). As @hitesh22rana mentioned the issue actually lies with the uvicorn regarding how it starts sub_processes for windows .
But even the fix he provided is useful for other machine but not in my case as uvloop neither comes with uvicorn nor i could install it via pip as it's not even supported for windows machine again ,lol.
What I found finally is that when using uvicorn to start the server I can avoid using the --reload flag which actually fixes the problem but then I've to restart the server after every change as without --reload flag it doesn't looks for changes.
For now switched to using nodemon in my development environment. And the solution i found for now only works with crawl4ai==0.3.71 version. I tried it using with 0.4.0 but even with this solution of not using the --reload flag the NotImplementedError error still continues.
For now I'll stick with the 0.3.71 as it seems helpful and more stable for my use case. Thanks to both of you for looking in the issue @unclecode @hitesh22rana.
@Aniket1026 You’re very welcome, no worries! I definitely want to ensure you get a good response with 0.4.x. Please share as many details about your platform as you can, so I can try to simulate it and reproduce the issue myself.
Even I am facing same issue with windows
@psychicDivine Would you please share your code snippet as well? Thx
hey I solved NotImplementedError in windows
import asyncio from asyncio import ProactorEventLoop from crawl4ai import AsyncWebCrawler, CacheMode import streamlit as st
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
st.title("crawl4AI") if "crawl_result" not in st.session_state: st.session_state["crawl_result"] = ""
async def main(): async with AsyncWebCrawler(verbose=True) as crawler: result = await crawler.arun(url ="https://www.inven.co.kr/board/lostark/6271?my=chuchu") print(result.markdown) st.session_state["crawl_result"] = result.markdown st.write(st.session_state["crawl_result"])
if name == "main": asyncio.run(main())
change your event_loop_policy ProactorEventLoopPolicy()
@rech4210 I will try to handle this in the code checking the environment.
Hi,
I am experiencing the same issue on Windows. When using the AsyncWebCrawler to crawl a URL, the code raises a NotImplementedError in asyncio.create_subprocess_exec.
Here are my environment details:
- Windows Version: [windows 11 24H2]
- Python Version: [Python 3.11.7]
- crawl4ai Version: [0.4.246]
from crawl4ai import AsyncWebCrawler
async with AsyncWebCrawler() as crawler:
result = await crawler.arun(
url="https://cbr.ru/news"
)
print(result.markdown)
On macOS, it worked fine.
@Xtreemrus Please try asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy()) at beginning of your code and let me know if it helpe?
Even I am facing same issue with windows.
Direct running is normal, but interface calls through the FastAPI report errors.
After many attempts, I found that uvicorn cannot have the --reload parameter.
If you remove it, you can invoke the interface correctly.
I don't know why.
And using asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy()) at the beginning of the code is not valid for me.
@skywolf123 would you please explain where you have removed the "--reload"?
@unclecode In my opinion, @skywolf123 meant was the use of --reload flag while running the fastapi app via uvicorn
E.g.
uvicorn app.main:app --reload
This seems to be the similar issue what was shared earlier by @Aniket1026 https://github.com/unclecode/crawl4ai/issues/282#issuecomment-2527408422
@hitesh22rana Yes, I understood that part. I just want to know from which file or part of Crawl4AI he found this. I want to check if I missed removing --reload from somewhere after debugging, because I can’t find it in the FastAPI server within the Docker setup.
Oh, I meant! @skywolf123 might be running their own fastapi server locally with the reload flag enabled. From there, they could be invoking a function via an endpoint, which in turn calls crawl4ai.
This doesn't seem to have anything to do with crawl4ai directly, though, as the --reload flag is likely part of their local setup rather than the Dockerized FastAPI server.
I see, yes most likely you are right, thx for explanation @hitesh22rana
Thanks @unclecode and @rech4210 , I was having the problem with NotImplementError aswell when I was using streamlit, thanks to asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy()) I was able to solve this, although I would appreciate if you could tell me what is the problem and how does changing the event loop policy helps fixing it.
Thanks again!!
Ran in the same error when executing the crawl4ai_quickstart.ipynb on
- windows 11
- python 3.11 and 3.12
- crawl4ai 0.5.0.post4
was solved when running it as an Script instead of the Jupyter Notebook https://stackoverflow.com/questions/44633458/why-am-i-getting-notimplementederror-with-async-and-await-on-windows/76981596#76981596
@Aniket1026
you can do this in windows
main.py
import asyncio
import sys
import os
# **Event loop strategy must be set before importing any module**
if sys.platform == "win32":
asyncio.set_event_loop_policy(asyncio.WindowsProactorEventLoopPolicy())
from fastapi import FastAPI
import uvicorn
# Replace with your actual routing module name
from your_router_module import router
app = FastAPI()
app.include_router(router)
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=8000)
your_router.py
from fastapi import APIRouter, HTTPException, status
from fastapi.responses import JSONResponse
from crawl4ai import *
router = APIRouter(
prefix="/crawl4ai",
)
async def test_crawl4ai_async():
"""test crawl """
try:
async with AsyncWebCrawler(
headless=True,
verbose=False
) as crawler:
result = await crawler.arun(
url="https://www.nbcnews.com/business",
word_count_threshold=10,
bypass_cache=True
)
return JSONResponse({
"status": "success",
"method": "async",
"data": result.markdown[:1000] + "..." if len(result.markdown) > 1000 else result.markdown,
"full_length": len(result.markdown)
}, status_code=status.HTTP_200_OK)
except Exception as e:
raise HTTPException(
status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
detail=f"error: {str(e)}"
)