crawl4ai Using FastAPI for Crawl4AI in a production environment, handling up to 50 concurrent requests.

Hello and thank you for building this amazing libary.

i'm using crawl4ai in a production environment with up to 50 concurrents requests in a fastapi Application. the problem i have is the memory usage, im building using docker and this is my docker file :

FROM python:3.12-slim

WORKDIR /workspace
ENV HOME=/workspace

ADD . /workspace

RUN pip install -r requirements.txt

RUN playwright install chromium
RUN playwright install-deps

EXPOSE 8585

CMD ["gunicorn", "main:app", \
     "--workers", "8", \
     "--worker-class", "uvicorn.workers.UvicornWorker", \
     "--bind", "0.0.0.0:8585", \
     "--timeout", "120", \
     "--keep-alive", "5", \
     "--max-requests", "500", \
     "--max-requests-jitter", "50", \
     "--log-level", "info", \
     "--access-logfile", "-"]

i tried two methods for handling crawl4ai, one using fastApi lifespan where i create a global crawler:

# Global AsyncWebCrawler instance
crawler = None

@asynccontextmanager
async def lifespan(app_start: FastAPI):
    # Startup: create and initialize the AsyncWebCrawler
    global crawler
    crawler = AsyncWebCrawler(verbose=False, always_by_pass_cache=True)
    await crawler.__aenter__()
    yield  
    if crawler:
        await crawler.__aexit__(None, None, None)

app = FastAPI(lifespan=lifespan)

scraping_semaphore = asyncio.Semaphore(10)

With this approach, memory usage keeps increasing indefinitely, requiring a server reboot every three days to keep it running smoothly, even with a Semaphore set to 10.

Alternatively, I’ve tried using the crawler without a global instance. With this approach, I experience memory spikes, but they eventually return to normal. Additionally, with 10 concurrent requests running on a server with 4 vCPUs and 16 GB of RAM, the response time averages around 20 seconds.

@app.post("/crawl_urls")
async def crawl_urls(request: ScrapeRequest):
    try:
        #print(f"Received {request.urls} urls to scrape")
        if not request.urls:
            return []
        tasks = [process_url(url) for url in request.urls]
        results = await asyncio.gather(*tasks)
        return results
    except Exception as e:
        #print(f"Error in scrape_urls: {e}")
        return []

async def process_url(url):
    try:
        if await is_pdf(url):
            return ''
        #start_time = time.time()
        result = await crawl_url(url)
        return result

    except Exception as e:
        #print(f"Error processing {url}: {e}")
        return ''

async def crawl_url(url):
    try:
        async with AsyncWebCrawler(verbose=False,always_by_pass_cache=True) as crawler:
            result = await crawler.arun(url=url, verbose=False,bypass_cache=True)
            #print(result.markdown)
            return result.markdown
    except Exception as e:
        print(f"error in crawl4ai {e}")
        return ''

# im bypassing the cache to test for concurrents requests

I’m not sure if there are specific settings I can adjust to improve performance and reduce memory usage. Any advice on optimizing this setup would be greatly appreciated.

P.S.: I also tried using arun_many, but it didn’t result in any performance improvement.

Oct 21 '24 10:10 YassKhazzan

Similar. Would be interested in a solution

Oct 21 '24 21:10 gsogol

@YassKhazzan Thank you for using our library. We are trying to release a Docker file this weekend. And there we considered some adjustments and also I'm preparing some examples of how to go for deployment, which hopefully by next week we will have a couple of ways.

One very interesting way is to use Modal allows to run this crawler as a function on the cloud, which, when I tested the performance, is really good.

The other thing is that run_many function is a temporary way to crawl multiple URLs; it's not efficient at all. Because right now, we are working on and testing our scraper module, which will be released very soon and it is designed to be efficient. So far, the focus was on crawling one link in a very efficient and proper way, and then using that to build a scrapper.

Ergo I personally do not suggest using arun_many, better to wait for Scraper module. Hopefully, very soon, we're going to release more examples and also a scraper module to get the best out of asynchronous crawling.

Please join to this issue conversation, there I plan to share more. You can also see the example of Modal. https://github.com/unclecode/crawl4ai/issues/180

Oct 24 '24 11:10 unclecode

Thanks @unclecode for your response, i joined the other discussion and wait for the update.

Oct 24 '24 11:10 YassKhazzan

You're welcome @YassKhazzan

Oct 24 '24 12:10 unclecode

Hi @unclecode ! Congrats for the amazing job.

Can you share the Scraper Module status with us?

Dec 18 '24 16:12 devellgit

@devellgit Its under review, I do my best to make it available soon, I really want it :))

Dec 24 '24 10:12 unclecode

from fastapi import FastAPI
from pydantic import BaseModel
from typing import Optional
import asyncio
import time
from crawl4ai import AsyncWebCrawler

app = FastAPI()
semaphore = asyncio.Semaphore(1)

class SingleCrawlRequest(BaseModel):
    url: str

@app.get("/")
async def crawl():
     return {
            "success": True
            }


@app.post("/crawl")
async def crawl(request: SingleCrawlRequest):
    async with AsyncWebCrawler() as crawler:
        async with semaphore:
            start = time.perf_counter()
            try:
                result = await crawler.arun(url=request.url)
                elapsed = time.perf_counter() - start
                return {
                    "success": True,
                    "error": None,
                    "data": {
                        "url": request.url,
                        "rawHtml": result.html,
                        "responseHeader": result.response_headers,
                        "responseStatusCode": result.status_code
                    },
                    "time_taken": elapsed
                }
            except Exception as e:
                elapsed = time.perf_counter() - start
                return {
                    "success": False,
                    "error": str(e),
                    "data": {
                        "url": request.url,
                        "rawHtml": None
                    },
                    "time_taken": elapsed
                }

I used this for production on AWS with ECS, 4 tasks with 4 vCPU and 8GB RAM.

But its unable to work with concurrent(200) requests.

What should be the correct approach?

Jun 04 '25 14:06 zeeshan-tahammul

crawl4ai crawl4ai copied to clipboard

Using FastAPI for Crawl4AI in a production environment, handling up to 50 concurrent requests.

crawl4ai
crawl4ai copied to clipboard