[Bug]: Memory Leak and StreamingResponse not working
crawl4ai version
0.5.0post8
Expected Behavior
Crawl4ai not crashing because of memory leaks and StreamingResponse not returning a empty byte string (b'')
Current Behavior
When using Crawl4ai for a longtime it reaches maximum Memory usage, to the point where when using it in K8s, it get killed and restarted. Although it got better with time, it still have issues with memory.
And while trying to read the StreamingResponse on a aiohttp client, it keeps returning a empty byte string (b'')
Is this reproducible?
Yes
Inputs Causing the Bug
Steps to Reproduce
Code snippets
OS
Linux Docker
Python version
3.10
Browser
Chrome
Browser version
No response
Error logs & Screenshots (if applicable)
No response
Also having this issue and the proposed fix in the PR doesn't seem to solve the issue. When running the crawler in ECS after a few hours the memory usage spikes to 100% and I need to restart all the tasks.
@RemeLards @viraj-lunani Hi, can you guys share a code snippet that causes this problem? This will help us reproduce it and check it out.
@ntohidi in 0.5.0post8 version the StreamingResponse wasnt working, and the memory leak was a problem in long running crawling, when many requests were created during hours, seems like sometimes it didnt close the browser properly .
I've tried many things, the BFS strategy ( which Im doing by code, because when using the BFS strategy of crawl4ai it would just crash), requesting N Urls at the same time ( like hundreds ), and all of those would crash the crawl4ai API you guys provide in the repo (Im not using the library from pip, Im using it as a server/service, since you guys already provide the FastAPI and etc.). What I did to increase the lifetime of the app running was doing in small URL batches ( like 10-20 Urls every request ), but with time it just accumulates memory :
Is the memory managment better with the new version you guys released weeks ago? Is the streaming working too in the new version? Its better just to confirm those things and migrate, any problem that persist I will report with code example
@ntohidi Another example:
At 19:00 K8s killed it, because it exceed the memory limit for example, from 20:00 till 07:00 I was running a crawling test, it started with 800MB and now is using 4GB without doing anything
Can you update to the latest version (0.7.7+) and see if it fixes this issue?
@SohamKukreti Seems like it got better, our version is the 0.6.x, I did not had time to update it since I need to change the auth of the source code. I can see on Prometheus and share the results asap
@RafaelFFAumo Okay Thanks, when you do update, let us know!
@SohamKukreti Sorry for being late, I've been really busy lately, but I managed to get a graph for you.
Seems like the 0.6.X (I dont know what specific 0.6 version I was using) fixed the memory leak or atleast made it much better, I think maybe part of the memory is for URL content caching (which I think is present in your software), but way better than GBs of Leaks. I currently finishing my graduation , but Im looking foward to updating it to version 0.7.X or even straight to 0.8.X (if it is released when the time comes) when I get time to do it.