[Bug]: ImportError: cannot import name 'CrawlerRunConfig' from 'crawl4ai' (/app/crawl4ai/__init__.py)
crawl4ai version
crawl4ai-0.4.248
Expected Behavior
To be able to import CrawlerRunConfig as per examples in https://docs.crawl4ai.com/extraction/no-llm-strategies/
Current Behavior
When the script initializes and tries to import CrawlerRunConfig, import fails.
Is this reproducible?
Yes
Inputs Causing the Bug
Issue happens during startup when import happens.
Steps to Reproduce
To reproduce, you can just run the given Dockerfile commenting out these 2 lines:
RUN pip install -U crawl4ai
RUN crawl4ai-doctor
I added the above 2 lines to try to upgrade to latest since i was unable to import class "JsonXPathExtractionStrategy" and found "JsonXPATHExtractionStrategy" instead.
So When adding the above 2 lines, during the build up, I saw this when the pip tried to install:
Attempting uninstall: crawl4ai
Found existing installation: Crawl4AI 0.3.745
Uninstalling Crawl4AI-0.3.745:
Successfully uninstalled Crawl4AI-0.3.745
Successfully installed cffi-1.17.1 crawl4ai-0.4.248
This confirmed that the docker image "unclecode/crawl4ai:all-amd64" is older version which made me to try to upgrade. However even after that, import is failing.
Code snippets
###################################################################################
# Dockerfile
FROM unclecode/crawl4ai:all-amd64
WORKDIR /app
# Install required packages
RUN apt-get update && apt-get install -y python3 python3-pip \
python3-venv git xvfb libpq-dev gcc \
xvfb fluxbox x11vnc && apt-get clean && rm -rf /var/lib/apt/lists/*
# Set the DISPLAY variable to use the virtual display.
ENV DISPLAY=:99
# Create the session directory
RUN mkdir -p /app/session
# Copy the entrypoint script into the container.
COPY entrypoint.sh /entrypoint.sh
RUN chmod +x /entrypoint.sh
# Expose the VNC & API port so you can connect from outside the container.
EXPOSE 5900
EXPOSE 8080
# Copy the rest of the project files
COPY . .
# Install Python packages
RUN pip install --upgrade pip && pip install --no-cache-dir -r requirements.txt
# just added this as part of testing since it seems that docker image was little behind
RUN pip install -U crawl4ai
RUN crawl4ai-doctor
# Start Xvfb, a simple window manager (fluxbox), and x11vnc.
# Then, run your application (for example, "python /app/crawler.py").
CMD ["/entrypoint.sh"]
#################### DOckerfile end ###############################################
######################## docker-compose.yaml #############################
crawl4ai:
build:
context: ./crawler-app
dockerfile: Dockerfile
env_file:
- .env
environment:
- POSTGRES_DB_MAIN=${POSTGRES_DB_MAIN}
- POSTGRES_CRAWL4AI_USER=${POSTGRES_CRAWL4AI_USER}
- POSTGRES_CRAWL4AI_PASSWORD=${POSTGRES_CRAWL4AI_PASSWORD}
- POSTGRES_HOST=${POSTGRES_HOST}
- POSTGRES_PORT=${POSTGRES_DB}
# LLM Provider Keys
#- OPENAI_API_KEY=${OPENAI_API_KEY:-}
#- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY:-}
#restart: "unless-stopped"
depends_on:
- postgres-db
ports:
# - "11235:11235"
# - "5900:5900"
- "8080:8080"
volumes:
- /dev/shm:/dev/shm
deploy:
resources:
limits:
memory: 4G
reservations:
memory: 1G
OS
Ubuntu 22.04 LTS running Docker
Python version
python3
Browser
NA
Browser version
NA
Error logs & Screenshots (if applicable)
Traceback (most recent call last):
File "/app/crawler.py", line 4, in <module>
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
ImportError: cannot import name 'CrawlerRunConfig' from 'crawl4ai' (/app/crawl4ai/__init__.py)
Traceback (most recent call last):
File "/usr/local/bin/uvicorn", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 412, in main
run(
File "/usr/local/lib/python3.10/site-packages/uvicorn/main.py", line 579, in run
server.run()
File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 65, in run
return asyncio.run(self.serve(sockets=sockets))
File "/usr/local/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/local/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 69, in serve
await self._serve(sockets)
File "/usr/local/lib/python3.10/site-packages/uvicorn/server.py", line 76, in _serve
config.load()
File "/usr/local/lib/python3.10/site-packages/uvicorn/config.py", line 434, in load
self.loaded_app = import_from_string(self.app)
File "/usr/local/lib/python3.10/site-packages/uvicorn/importer.py", line 19, in import_from_string
module = importlib.import_module(module_str)
File "/usr/local/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/app/crawler.py", line 4, in <module>
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
ImportError: cannot import name 'CrawlerRunConfig' from 'crawl4ai' (/app/crawl4ai/__init__.py)
Crawler script finished. Sleeping for 60 seconds before restarting...
Traceback (most recent call last):
File "/app/crawler.py", line 4, in <module>
from crawl4ai import AsyncWebCrawler, CrawlerRunConfig
ImportError: cannot import name 'CrawlerRunConfig' from 'crawl4ai' (/app/crawl4ai/__init__.py)
Crawler script finished. Sleeping for 60 seconds before restarting...
As I found, there is bunch of other stuff missing. Seems really related to the version that the image on public repo uses.
As per your doco, while it is recommended to use Docker, problem is that it is old missing many features which your public doco talks about.
So switching to different docker image (python) and then running RUN pip install -U crawl4ai RUN playwright install RUN playwright install-deps
Removes the problem.
Hey, I am getting this error while even using crawl4ai==0.5.0, any Idea?
in
@mirozbiro @piyushptiwari1
Hi, we’ve already updated the Docker image. I just ran a simple example of no LLM strategy, and it worked fine.
Could you please try using the latest Docker version and see if the issue still occurs? Also, if you can share the code you tested with, that would be really helpful for us to debug further.
Checking over next few days.
Hi @mirozbiro. I've tried this just now with our latest release 0.6.0 and I don't see this issue. Please try to pull the latest docker image from dockerhub and try again.
Reopen this issue if the problem still exists.
Sounds good. Thanks.