crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: After the crawl4ai container runs for a few minutes, a large number of chrome processes remain, consuming CPU and memory. I hope to solve this problem.

Open ganecheng opened this issue 9 months ago β€’ 13 comments

crawl4ai version

docker image tag: unclecode/crawl4ai:basic-amd64

Expected Behavior

When crawl4ai finishes crawling the web page, the used Chrome process should be released immediately to avoid a large number of Chrome processes remaining in the background.

Or is there an available container configuration that can achieve this goal?

Current Behavior

Image

Is this reproducible?

Yes

Inputs Causing the Bug

curl --location 'http://127.0.0.1:11235/crawl' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer 12345678' \
--data '{
    "urls": "test url here",
    "crawler_params": {
        "headless": true,
        "page_timeout": 15000,
        "remove_overlay_elements": true,
        "semaphore_count": 1
    },
    "extra": {
        "word_count_threshold": 20,
        "bypass_cache": true,
        "only_text": true,
        "process_iframes": false
    }
}'

Steps to Reproduce

docker run -p 11235:11235 --env CRAWL4AI_API_TOKEN=12345678 --env MAX_CONCURRENT_TASKS=1 --name crawl4ai -m 4G --restart unless-stopped unclecode/crawl4ai:basic-amd64

Code snippets


OS

Linux Docker

Python version

3.10.15

Browser

No response

Browser version

No response

Error logs & Screenshots (if applicable)

No response

ganecheng avatar Apr 05 '25 07:04 ganecheng

Same problem. I'm trying to crawl a list of ~350 urls, and I even have timers to slow down the retries, but after a while, the load average on my server spikes up super high and I see a bazillion chrome processes running. I'm using the dockerized version with n8n. It's like it never closes the browser process after it completes a crawl.

imonroe avatar Apr 10 '25 16:04 imonroe

Yeah, after about 150 URLs, it's just consumed every available resource on the server with Chrome threads.

Image

imonroe avatar Apr 10 '25 17:04 imonroe

I suppose it'd be helpful to include some information about my config. As mentioned, I'm running the docker version.
For my environment, I have MAX_CONCURRENT_TASKS=5 My API call is for a single URL at priority: 10. Additionally:

"extra": {
  "word_count_threshold": 20,
  "only_text": True
},
"crawler_params": {
  "headless": True,
  "page_timeout": 6000,
  "verbose": False,
  "semaphore_count": 5,
  "use_managed_browser": False
}

imonroe avatar Apr 10 '25 18:04 imonroe

I ended up running 10 of them and killing each one after a random number of seconds, like 120-300. I had to implement smart-retry for this.

d0rc avatar Apr 20 '25 14:04 d0rc

same issue. it's crazy. Seems they leak resources all the time.

ppaanngggg avatar Apr 24 '25 02:04 ppaanngggg

Same here... Lot's of processes, memory increasing and rising.

If you need help debugging, don't hesitate.

Thanks for the great product.

joaomnmoreira avatar May 21 '25 22:05 joaomnmoreira

Same problem. This is the timestamp, when I deployed Crawl4AI on my Railway, since then CPU usage has increased even without crawl tasks.

Image

voidaugust avatar Jun 03 '25 05:06 voidaugust

anyone able to find any solution to this? Having same issues.

tanushshukla avatar Jun 29 '25 22:06 tanushshukla

me too, having same issues

Gary-666 avatar Jul 22 '25 13:07 Gary-666

@unclecode Any ideas?

pleomax0730 avatar Jul 28 '25 02:07 pleomax0730

same problem

TideDra avatar Aug 28 '25 15:08 TideDra

Hi everyone, thanks for all the detailed reports, and sorry for the resource headache.

We’ve seen exactly what you’re describing in earlier Docker builds: when a lot of short-lived crawls run back-to-back, Chrome contexts accumulate faster than the old cleanup loop could reclaim them. Starting in Crawl4AI v0.7.7 (β€œSelf-Hosting & Monitoring” release) we have a rewritten container runtime around a smarter browser pool:

  • Always-on β€œhot” pool + auto-pruned β€œcold” pool: frequently used configs stay warm, one-off configs are cleaned up as soon as they go idle, so you don’t end up with dozens of orphaned Chrome processes.

  • Integrated monitoring dashboard + REST/WebSocket APIs: you can see in real time how many browsers/contexts are alive, CPU/memory per worker, and force cleanup if you spot a buildup.

  • Improved lifecycle management: managed browsers now get tied to the crawl task itself; when a crawl finishes we close the page/context immediately unless you explicitly keep a session open.

Could you upgrade to the latest release (0.7.7) and see if you still face similar issues? Thanks!

SohamKukreti avatar Nov 20 '25 16:11 SohamKukreti

Hey @SohamKukreti , just coming in this.

I'm on 0.7.7 and still face the memory leak going crazy. Not sure what I should do, currently I just kill the browser manually in the monitoring view using the red cross next to it.

Some details: server is 16vCPU - 32GB RAM, Crawl4AI is deployed using docker-compose (see below) and I'm calling it using the API (see below too). With 16 concurrency, I have a backlog of 75k websites to scrape, after maybe 1000 the ram is already at 75+% and then after another 500 (so approx 1500-2000), the RAM is at 99%+, OOM get in action, but then the jobs hangs indefinitely and new one hangs too so I can end up with 1100+ scrape jobs "in action" with the counter increasing but nothing resolving. Only solution in such case I found is to restart the docker...

Docker-compose:

version: '3.8'
services:
  crawl4ai:
    image: unclecode/crawl4ai:latest
    container_name: crawl4ai
    ports:
      - "0.0.0.0:11235:11235"
    environment:
      - CRAWL4AI_ENV=prod
      - MAX_CONCURRENT_TASKS=25
      - MEMORY_THRESHOLD_PERCENT=80
      - PYTHONUNBUFFERED=1
    volumes:
      - ./crawl4ai_data:/data
      - ./logs:/app/logs
    shm_size: '2gb'
    mem_limit: 28G
    mem_reservation: 16G
    deploy:
      resources:
        limits:
          cpus: '15'
          memory: 28G
        reservations:
          cpus: '8'
          memory: 16G
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:11235/health"]
      interval: 30s
      timeout: 10s
      retries: 3

How I'm calling it:

      const requestBody: Crawl4AIRequest = {
        urls: [website],
        browser_config: {
          type: 'BrowserConfig',
          params: {
            extra_args: [
              '--disable-dev-shm-usage',
              '--disable-gpu',
              '--no-sandbox',
            ],
          },
        },
        crawler_config: {
          type: 'CrawlerRunConfig',
          params: {
            magic: true,
            cache_mode: 'bypass',
            page_timeout: 60000,
            markdown_generator: {
              type: 'DefaultMarkdownGenerator',
              params: {},
            },
            ...(withProxy
              ? {
                  proxy_config: {
                    type: 'ProxyConfig',
                    params: {
                      server: this.proxyServer,
                      username: this.proxyUsername,
                      password: this.proxyPassword,
                    },
                  },
                }
              : {}),
          },
        },
      };

      const response = await fetch(`${this.apiUrl}/crawl`, {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
          ...(this.apiSecret ? { 'X-API-Key': `${this.apiSecret}` } : {}),
        },
        body: JSON.stringify(requestBody),
      });

If you need more info, or need access to the VPS to debug it "live", you can ping me on Discord or here!

-- update: I have something: everything is closed (no browser active, even the persistent one), the monitoring dashboard is fucked up (nothing shows up) and the docker is still using 80+% of memory doing nothing. It's also taking 50+% of one core, even if there's no scrape to do.

No browser open (as per the monitoring UI), but top: https://pastebin.com/HpXTKgg0 (pastebin due to discord limit) And ps aux: https://pastebin.com/mAw1Fi6t (pastebin due to discord limit)

-- update 2: Trying to find a pattern with the log, but I can't. Initially though it would be only during failure (timeout, crash, ..) but it doesn't seems like so. Small example of big leaks in a short amout of time (some page seems to produce 0 leaks, while other produce 15MB??):

2025-11-25 10:28:42,427 - api - INFO - Memory usage: Start: 261.49609375 MB, End: 275.12109375 MB, Delta: 13.625 MB, Peak: 275.12109375 MB
[ERROR]... Γ— Error updating image dimensions: Page.evaluate: Execution context 
was destroyed, most likely because of a navigation 
[FETCH]... ↓ https://victoire.be/fr/?utm_source=openai                          
| βœ“ | ⏱: 5.42s 
[SCRAPE].. β—† https://victoire.be/fr/?utm_source=openai                          
| βœ“ | ⏱: 0.09s 
[COMPLETE] ● https://victoire.be/fr/?utm_source=openai                          
| βœ“ | ⏱: 5.51s 
2025-11-25 10:28:42,629 - api - INFO - Memory usage: Start: 261.49609375 MB, End: 275.74609375 MB, Delta: 14.25 MB, Peak: 275.74609375 MB
[FETCH]... ↓ 
https://www.google.com/maps/search/Espace+Immo+Brussels,+Bruxelles,+Belgique?utm
_source=openai       | βœ“ | ⏱: 4.70s 
[SCRAPE].. β—† 
https://www.google.com/maps/search/Espace+Immo+Brussels,+Bruxelles,+Belgique?utm
_source=openai       | βœ“ | ⏱: 0.25s 
[COMPLETE] ● 
https://www.google.com/maps/search/Espace+Immo+Brussels,+Bruxelles,+Belgique?utm
_source=openai       | βœ“ | ⏱: 4.95s 
2025-11-25 10:28:43,001 - api - INFO - Memory usage: Start: 261.49609375 MB, End: 276.87109375 MB, Delta: 15.375 MB, Peak: 276.87109375 MB
[FETCH]... ↓ http://www.happyhouses.be/?utm_source=openai                       
| βœ“ | ⏱: 5.54s 
[SCRAPE].. β—† http://www.happyhouses.be/?utm_source=openai                       
| βœ“ | ⏱: 0.02s 
[COMPLETE] ● http://www.happyhouses.be/?utm_source=openai                       
| βœ“ | ⏱: 5.57s 
2025-11-25 10:28:43,038 - api - INFO - Memory usage: Start: 261.49609375 MB, End: 276.87109375 MB, Delta: 15.375 MB, Peak: 276.87109375 MB

Martichou avatar Nov 24 '25 09:11 Martichou