crawl4ai icon indicating copy to clipboard operation
crawl4ai copied to clipboard

[Bug]: Memory Leak and StreamingResponse not working

Open RemeLards opened this issue 8 months ago • 7 comments

crawl4ai version

0.5.0post8

Expected Behavior

Crawl4ai not crashing because of memory leaks and StreamingResponse not returning a empty byte string (b'')

Current Behavior

When using Crawl4ai for a longtime it reaches maximum Memory usage, to the point where when using it in K8s, it get killed and restarted. Although it got better with time, it still have issues with memory.

And while trying to read the StreamingResponse on a aiohttp client, it keeps returning a empty byte string (b'')

Is this reproducible?

Yes

Inputs Causing the Bug


Steps to Reproduce


Code snippets


OS

Linux Docker

Python version

3.10

Browser

Chrome

Browser version

No response

Error logs & Screenshots (if applicable)

No response

RemeLards avatar Apr 12 '25 22:04 RemeLards

Also having this issue and the proposed fix in the PR doesn't seem to solve the issue. When running the crawler in ECS after a few hours the memory usage spikes to 100% and I need to restart all the tasks.

Image

viraj-lunani avatar Apr 15 '25 21:04 viraj-lunani

@RemeLards @viraj-lunani Hi, can you guys share a code snippet that causes this problem? This will help us reproduce it and check it out.

ntohidi avatar May 12 '25 10:05 ntohidi

@ntohidi in 0.5.0post8 version the StreamingResponse wasnt working, and the memory leak was a problem in long running crawling, when many requests were created during hours, seems like sometimes it didnt close the browser properly .

I've tried many things, the BFS strategy ( which Im doing by code, because when using the BFS strategy of crawl4ai it would just crash), requesting N Urls at the same time ( like hundreds ), and all of those would crash the crawl4ai API you guys provide in the repo (Im not using the library from pip, Im using it as a server/service, since you guys already provide the FastAPI and etc.). What I did to increase the lifetime of the app running was doing in small URL batches ( like 10-20 Urls every request ), but with time it just accumulates memory :

Image

Is the memory managment better with the new version you guys released weeks ago? Is the streaming working too in the new version? Its better just to confirm those things and migrate, any problem that persist I will report with code example

RemeLards avatar May 27 '25 13:05 RemeLards

@ntohidi Another example:

Image

At 19:00 K8s killed it, because it exceed the memory limit for example, from 20:00 till 07:00 I was running a crawling test, it started with 800MB and now is using 4GB without doing anything

RemeLards avatar May 28 '25 14:05 RemeLards

Can you update to the latest version (0.7.7+) and see if it fixes this issue?

SohamKukreti avatar Nov 18 '25 13:11 SohamKukreti

@SohamKukreti Seems like it got better, our version is the 0.6.x, I did not had time to update it since I need to change the auth of the source code. I can see on Prometheus and share the results asap

RafaelFFAumo avatar Nov 20 '25 03:11 RafaelFFAumo

@RafaelFFAumo Okay Thanks, when you do update, let us know!

SohamKukreti avatar Nov 20 '25 16:11 SohamKukreti

@SohamKukreti Sorry for being late, I've been really busy lately, but I managed to get a graph for you.

Image

Seems like the 0.6.X (I dont know what specific 0.6 version I was using) fixed the memory leak or atleast made it much better, I think maybe part of the memory is for URL content caching (which I think is present in your software), but way better than GBs of Leaks. I currently finishing my graduation , but Im looking foward to updating it to version 0.7.X or even straight to 0.8.X (if it is released when the time comes) when I get time to do it.

RafaelFFAumo avatar Dec 03 '25 02:12 RafaelFFAumo