crawl4ai
crawl4ai copied to clipboard
cannot import name 'WebCrawler' from 'crawl4ai'
Hi, when I try to run crawl4ai with microsoft edge on windows, I have this error below, ( same code works for ubuntu on chrome)
Traceback (most recent call last):
File "d:\work\indexing\scrapper.py", line 1, in
and here is my code below:
from crawl4ai import WebCrawler
import json
with open('D:\work\indexing\com\scrapped_urls.json', 'r') as file:
json_data = json.load(file)
print(type(json_data))
# Create an instance of WebCrawler
crawler = WebCrawler()
# Warm up the crawler (load necessary models)
crawler.warmup()
scrapped_file = 'D:\work\indexing\com\xyz.txt'
# Iterate through the JSON array
for item in json_data:
#print("The url ", item["url"], " is scrapping...")
# Run the crawler on a URL
result = crawler.run(url=item["url"])
# Put the scrapped text into file
f = open(scrapped_file, "a")
f.write(result.markdown)
f.close()
Do you have any idea?
Thanks for using our library. I do have a question. When you say running our library with Microsoft Edge and Windows, could you please clarify what you mean by that? Crawl4AI does not have any integration with Microsoft Edge or any other browser on your computer. So, I'm guessing you might be experiencing an error related to a Windows OS. If that's the case, I manage some additional tests on Windows to determine the root cause of the issue. I will also review the code you shared to see if I can identify the problem. Meanwhile, We are working on adding a scraping engine to the library, so please stay tuned for that update.
Thanks for using our library. I do have a question. When you say running our library with Microsoft Edge and Windows, could you please clarify what you mean by that? Crawl4AI does not have any integration with Microsoft Edge or any other browser on your computer. So, I'm guessing you might be experiencing an error related to a Windows OS. If that's the case, I manage some additional tests on Windows to determine the root cause of the issue. I will also review the code you shared to see if I can identify the problem. Meanwhile, We are working on adding a scraping engine to the library, so please stay tuned for that update.
Hi @unclecode, thanks for your interest about our problem (we work together with @gulnihalk).
I've wrote this code in Ubuntu and my browser is Chrome. It scrapes all the urls inside the json file very well. That library is really good work! But when we try exactly same code (except the file paths) in Windows OS that has only Microsoft Edge browser, we got the error
ImportError: cannot import name 'WebCrawler' from 'crawl4ai' (C:\Users\abc..\Local\Programs\Python\Python310\lib\site-packages\crawl4ai_init_.py)
Even we install all possible dependencies of crawl4ai, and even change the classes inside source code for Edge (like changing self.driver = webdriver.Chrome(service=self.service) to -> self.driver = webdriver.Edge(service=self.service) inside the crawler_strategy.py code) it still doesn't work. Maybe those source codes are related to Selenium part. Selenium part is mentioned in the source code as this:
@asumansaree Sorry for my late response, I've been on a short trip. I figured why it behaves this way. You are still using it in previous version which was synchronous by default, now it's asynchronous. To use it in sync mode, you have to import the web crawler directly from the crawler module from crawl4ai.web_crawler import WebCrawler. I suggest you switch to async mode which is using Playwright, faster and better abilities. Please refer to the documents and examples; it's a significant improvement. I will share code example for async version:
from crawl4ai import AsyncWebCrawler
async def simple_crawl():
async with AsyncWebCrawler(verbose=True) as crawler:
result = await crawler.arun(url="https://www.nbcnews.com/business")
print(result.markdown[:500])
async def main():
await simple_crawl()
if __name__ == "__main__":
asyncio.run(main())
Hi, @unclecode, I have the same problem when running your test code:
from crawl4ai.web_crawler import WebCrawler NameError: name 'asynccontextmanager' is not defined
seems that the package does not work for my IDE, and cannot find the solution
@Themisstone Would you please try the recent version 0.4.2? If it still doesn't work, can you share with me your Python version and the operating system specs? I may have some IDs for you.
@Themisstone @asumansaree @gulnihalk Closing this issue due to inactivity. Please do try 0.4.2, if it still doesn't work, raise a new issue.