apcu icon indicating copy to clipboard operation
apcu copied to clipboard

100% fragmentation

Open staabm opened this issue 6 years ago • 16 comments

we have a website which apcu reports a fragmentation of 100%.

the apc itself only occupies 20mb of the maximum 128mb.

Is a 100% fragmentation something I should be worried about? if so, how to proceed at best to fix this problem?

staabm avatar Feb 21 '19 08:02 staabm

So the actual question is: how should I handle the 100% fragmentation case? Doesnt apcu collect garbage so space will be reclaimed sometimes? Or should I make sure to not save data too often (or data which changes regularly) because there is no builtin way - except clearing the whole cache manually?

staabm avatar Feb 24 '19 11:02 staabm

@staabm Looks like your issue is same like #327 Nothing new since Oct 2018

trailsnail avatar Mar 02 '19 10:03 trailsnail

I feel like having some docs about how things are supposed to work and what things todo when fragmentation is high would be enough for me.

staabm avatar Mar 02 '19 10:03 staabm

The fragmentation metric is pretty meaningless. X% fragmentation means that X% of the free blocks are smaller than 5MB. If you're routinely storing 5MB or more in a single entry, then that's a problem. If not, then you shouldn't particularly care.

It would probably make sense to reduce that threshold, or better, show a distribution of free block sizes.

nikic avatar Mar 02 '19 11:03 nikic

If you're routinely storing 5MB or more in a single entry, then that's a problem. If not, then you shouldn't particularly care.

It would probably make sense to reduce that threshold, or better, show a distribution of free block sizes.

@nikic does this mean I "loose" 5MB of storage when storing only a boolean or a simple int (or e.g. use a apc counter) because everything you store will reserve 5MB?

staabm avatar Mar 04 '19 08:03 staabm

@staabm No, a boolean will only take up space for a boolean, plus overhead (which is fairly large relative to the size of a boolean, but not that large). What I meant is that fragmentation (as computed by apc.php) is only strictly problematic if you are storing very large entries. If you have 100% fragmentation and store a 5MB entry, then the store mail fail (and trigger a cache expunge, depending on overall utilization). But if you're only storing small entries, then fragmentation (again, as computed) may not matter at all.

nikic avatar Mar 04 '19 08:03 nikic

We have a similar problem. We did some long term tests on PHP7.2 and noticed the site became inaccessible after a while. The apc.shm_size was set to 256M and later to 512M - with the same result. I am now trying to get the problem pinned down by lowering it to 32M and logging each website call.

I am aware that, the cache size is way to small for our website - therefore the "Cache full count" in apc.php is going up - but the site keeps functioning.

After a while (well within a suitable TTL) we run into the same problem: it seems apcu does not longer have a place to store another variable, despite the fact it has over 50% free space according to the apc.php stats. From there on - nothing works anymore. No new variables can be added, and the garbage collection is not triggered.

Fragmentation is at 100% - as I understand is not an actual problem (#205, #127) - but I see it as an indicator that something is wrong. For the tests the only non-default settings are apc.max_file_size=1M and apc.cli=1 . There are over 7000 user entries, with the largest having less then 500kb.

I would like to know about any possible config settings I can change to stop that from happening. Any help is appreciated.

Setup: APCu Version | 5.1.12 (newest available) PHP Version | 7.2.10 Running on: CentOs 7

image

for reference: typo3 error: #1232986277: Could not set value.

h4de5 avatar Mar 19 '19 16:03 h4de5

That does looks like a case where fragmentation becomes a real problem. Your cache is less than half full, so no cache clear is triggered, but the high level of fragmentation still makes inserts fail.

I don't think there's any configuration you can tweak, this is something that needs to be handled inside apcu.

nikic avatar Mar 21 '19 14:03 nikic

+1 After 2 days of 100% fragmentation, we cant insert data to cache, even 300k block large. 5.1.12 version, php 7.2

mightydok avatar Apr 01 '19 17:04 mightydok

@h4de5 if you set apc.ttl to other than 0 value is problem still persist?

mightydok avatar Apr 01 '19 19:04 mightydok

@mightydok I would have to set it to a very short time - the situation in the screenshot happens after 6 minutes. with 512mb cache size it takes up to 30min under medium load. To have data cached for less than 1 hour does not make much sense for us. (actually I am hoping to keep them for at least 6 to 12hours). An increased size in apc.entry_hint also didn't change much. I was playing around with the apc.smart value - but I can't find a good explanation for this setting.

h4de5 avatar Apr 01 '19 20:04 h4de5

@h4de5 i think you need more memory for cache, at our production server we have 1G as shm_size. And after i set ttl to 7200 seconds, i cant reach 100% fragmentation.

@nikic how garbage collection logic works for apcu? When apcu decide to clean cache?

mightydok avatar Apr 02 '19 08:04 mightydok

@mightydok : it seems I may give up on understanding this module.. ~I changed the ttl to 1800 (and smart to 30) and that may have been the trick:~

~test site is running since hours, even with a smaller cache size, it peforms perceptibly fast, cache-hit rate is astonishing high and no more 100% fragmentation messages. I will leave it like this and keep up the monitoring.~

/edit: After some more time investigating this issue, we found out that the sudden change in behaviour was triggered due to an adaptation of the application itself, where we simple no longer use apcu for a certain part of the service. the configuration above did not change anything :(

For conclusion: it seems our website stores too many, too small values in the apcu cache, which is clearly a problem to be addressed on our side. at the same time, a highly fragmented apcu cache may get unusable if there is more than half of the space wasted due to a missing triggering of the cache clear mechanism.

for a typo3 reference: don't use apcu for cache_hash.

h4de5 avatar Apr 02 '19 15:04 h4de5

@h4de5 why u set smart to 30?

mightydok avatar Apr 04 '19 20:04 mightydok

@nikic Do you have any suggestions for what we can do if we've reached high fragmentation? We've run into this issue quite a bit recently: the number of apcu_store failures continually ramp up over time until the whole cache is unusable. I noticed in this comment that storing both small and large objects could contribute to aggravating the fragmentation, would you recommend we stop doing that? Should we just hold out for an apcu fix? Thank you!

psyonix-luis avatar Oct 27 '20 19:10 psyonix-luis