The memory usage shown by the snapshotter is inconsistent with that shown by htop.
configuration = service_locator.get_configuration()
# configuration.purge_on_start = False
configuration.write_metadata = False
configuration.persist_storage = False
configuration.available_memory_ratio = 0.8
configuration.internal_timeout = timedelta(minutes=5)
It gradually becomes sluggish until the program freezes completely.
configuration.available_memory_ratio = 0.8
You are setting the available memory to 80%, htop reports 7.65g, 80% of that is 6.12g, so its matching exactly.
configuration.available_memory_ratio = 0.8
You are setting the available memory to 80%, htop reports 7.65g, 80% of that is 6.12g, so its matching exactly.
However, the memory usage shown by htop is around 3.5GB.Crawlee actually shows that it used 6.99GB
From the screenshot I see 3.546/7.65 GB used, while it is printing something about 6.05 GB being used. So I guess it is not issue about the limit calculation, but about the actual value of the used memory.
Looking into the code I see we sum up memory usage of the process and all it's children
And from docs we can get to this blog post where this line might be particularly interesting:
RSS (Resident Set Size), which is what most people usually rely on, is misleading because it includes both the memory which is unique to the process and the memory shared with other processes.
So here is my wild guess: Maybe we count some memory twice(or multiple times) if we sum up the usage by children that are using the same shared memory?
I will continue looking into this with some tests
Yes, but visually, you can see that 3.5g is the green part, and you still have a huge yellow part, which should be a cache. Summing those, I'd say it could be about 6g.
You could try to set the memory explicitly to 16g and see if things fall apart because of OOM (which I would read as we count things correctly and you misread what htop is showing) or they run fine (which would confirm its something on our end).
memory shared with other processes.
This is still occupied memory, its not relevant who uses it, right?
This is still occupied memory, its not relevant who uses it, right?
Yes, but it should be counted only once. Maybe we count it multiple times by summing up memory usage of the children processes that use some same portion of the shared memory? But I have to do some tests first to see if it is correct or wrong assumption.