apcu icon indicating copy to clipboard operation
apcu copied to clipboard

will the fragmentation reached 100% cause any problem?

Open landso opened this issue 8 years ago • 23 comments

I use apcu in my project, after the program runs several days, the control pannel shows:

Memory Usage Free: 401.8 MBytes (78.5%) Used: 110.2 MBytes (21.5%)

Detailed Memory Usage and Fragmentation Fragmentation: 100.00% (401.8 MBytes out of 401.8 MBytes in 45818 fragments)

Fragmentation reached 100%. I want to know will this cause any problem? such as fail to store or fetch variables under certain condition? or which is the right way to do ? Thank you very much for help me !

Here is my apcu information and settings:

General Cache Information APCu Version 4.0.7 PHP Version 5.5.27 APCu Host x.x.com (localhost.localdomain) (x.x.x.x) Server Software Apache Shared Memory 1 Segment(s) with 512.0 MBytes (mmap memory) Start Time 2015/08/24 14:29:15 Uptime 6 days, 4 hours and 37 minutes File Upload Support 1

Cache Information Cached Variables 49203 (107.6 MBytes) Hits 241257575 Misses 31061670 Request Rate (hits, misses) 508.96 cache requests/second Hit Rate 450.90 cache requests/second Miss Rate 58.05 cache requests/second Insert Rate 283.78 cache requests/second Cache full count 0

Runtime Settings apc.coredump_unmap 0 apc.enable_cli 0 apc.enabled 1 apc.entries_hint 70000 apc.gc_ttl 600 apc.mmap_file_mask /tmp/apc_apache.51aQmB apc.preload_path
apc.rfc1867 0 apc.rfc1867_freq 0 apc.rfc1867_name APC_UPLOAD_PROGRESS apc.rfc1867_prefix upload_ apc.rfc1867_ttl 3600 apc.serializer php apc.shm_segments 1 apc.shm_size 512M apc.slam_defense 0 apc.smart 0 apc.ttl 3600 apc.use_request_time 1 apc.writable /tmp

landso avatar Aug 30 '15 11:08 landso

1 512MB segment is rather large, have you tried to make more segments of a more reasonable size ?

krakjoe avatar Aug 30 '15 11:08 krakjoe

Thank you .

Server runs under centos 6.5 x64

I tried to modify php.ini to this:

apc.shm_size = 128M apc.shm_segments = 4

After restart apache, the contral pannel shows:

Shared Memory 1 Segment(s) with 128.0 MBytes (mmap memory)

Shared Memory is 128M rather than 4 * 128M = 512M.

And I fined in the Installation Instructions says:

Setting this to a value other than 1 has no effect in mmap mode since mmap'ed shm segments don't have size limits.

It seems not to work.

I found that I will get wrong values some time when fragmentation reached 100% , it seems some variables failed to store/overwrite. ( I'm not quite sure that, so I need some official explanation :) thanks! )

landso avatar Aug 30 '15 17:08 landso

Oh woops, sorry, I missed that in your initial query. That's totally right, mmap is one segment only.

What is needed is a way to reproduce the bug, are you able to reproduce reliably ?

krakjoe avatar Aug 30 '15 17:08 krakjoe

So is this a bug? (it's still marked as "question") If not, how is one supposed to avoid excessive fragmentation?

teo1978 avatar Nov 09 '15 19:11 teo1978

I'm not able to reproduce, so that it doesn't seem to matter what pattern of update/store/delete I use, I can't reach 100% fragmentation.

What I need is a reproducing script, it needs to use the same kind of patterns you are using and needs to reproduce the bug with minimal code.

I realise, this might be a big ask, it might also be impossible, but on your journey to try, you will find out many valuable pieces of information that may lead us to some answers.

krakjoe avatar Nov 10 '15 05:11 krakjoe

Please stop with the "journey to try" thing, it's so irritating. You need to realize that a user's journey is already long enough if they have come here to report an issue and provide the information they have: that might or might not include a reproducing scrit. Starting from there, the one who is supposed to take the "journey" of investigating the issue is who writes or maintains the code.

Unless the user is capable of contributing, of course, but by saying "I won't look at it until you give me a reproducing script" you are saying "I expect you to do half the job of fixing it by yourself".

teo1978 avatar Nov 10 '15 09:11 teo1978

Starting from there, the one who is supposed to take the "journey" of investigating the issue is who writes or maintains the code.

This is why OSS sucks... (for the maintainer(s))

PeeHaa avatar Nov 10 '15 10:11 PeeHaa

@teo1978 Sounds like someone who doesn't like the phrase "journey to try" has a chip on their shoulder - regardless, no need to bring that attitude here.

The creator of the library has no requirement to help out other users - they either do or they don't. This isn't a reply aligning with "well it's free, so don't complain", because that's BS. However, Joe can get to the bottom of the problem faster with a reproducing script that he requested.

I realise, this might be a big ask

That's a request. Get off your high horse. I don't know if you're having a bad day, but don't drag that over here. Working together (both the person with the problem and the owner of the lib) will produce faster results.

J7mbo avatar Nov 10 '15 10:11 J7mbo

The creator of the library has no requirement to help out other users - they either do or they don't.

I know, I simply criticise the attitude in his replies. Of course he has no requirement to fix the bugs.

However, when a user comes here and reports a bug, the user is helping the developer to improve his code by pointing at issues. The developer might not be interested in improving his code at all, and hence in fixing issues, but if he does have an interest in that, then the reaction I'd expect would be to look deeper at the issue as much as he can with the information (which may be little) given by the user, try if with the hints given by the user he can himself think of and construct a reproducible test case, and of course, asking for more information from the user is fine. That's how I react when people tell me there's a problem in softwer I wrote or maintain (though of corse there's a big difference because it's usually people who pay me to write or maintain the code).

What upsets me is the persistent tone in his replies to bugs which sounds like

  • a bug report is worthless unless it contains a reproducible case
  • a bug doesn't deserve any attention whatsoever if you (meaning the reporter, or anybody else other than the developer) can't provide a reproducible test case
  • I won't even waste my time investigating a bug in my code if you don't give me a reproducible test case
  • I don't even take into consideration the possible existence of a bug until I see with my eyes and I can reproduce it.

I find that as bad an attitude as the one you think I am having, that is, "requiring" the creator of the library to help.

I said "sounds like". Maybe it's not what he means. That's why I'm suggesting to reconsider the way he constantly phrases it.

Working together (both the person with the problem and the owner of the lib) will produce faster results.

Exactly, but I don't think his attitude is one of "working together" either. He's basically saying: you go find a reproducible test case, then perhaps I'll have a look.

This isn't a reply aligning with "well it's free, so don't complain"

It pretty much is, and if it was phrased that way, it would be more honest.

teo1978 avatar Nov 10 '15 11:11 teo1978

@teo1978 pro tip: stop acting like an idiot in github issue threads...

PeeHaa avatar Nov 10 '15 11:11 PeeHaa

So, let's start working together

The answers to the following questions might help me (or krakjoe himself, for that matter) build a reproducible test case.

First, do I understand correctly what fragmentation is:

  • some piece of information is cached occupying a segment of memory (and perhaps a bit more than it strictly needs for a number of possible reasons)
  • then later it's deleted, or anyway, freed for possible reuse
  • however that space is wasted because another piece of information to be cached is too big to fit in any of the existing "holes" and hence is stored in a new segment, occupying more memory instead of reusing "wasted" memory.

Or maybe something like this:

  • for efficiency or whatever reasons, allocating a new block of memory is preferred to reusing an existing "hole", even though the data would fit there - hence leading to fragmented "wasted" space
  • memory is allocated in chunks of some kind, meaning you can't for some reason reuse free space in a chunk until the whole chunk is completely freed, or something like that.

Until I understand this better, I can't even start thinking of a reproducible script, which I'd love to contribute.

Is that correct, more or less? If so, then:

  • Does 100% fragmentation mean that, at a certain point, all the total cache memory is either used or "wasted" (meaning all the potentially free space is in fragments which cannot be used for the abovementioned reasons)?
  • is that something that is supposed to never happen?
  • if so, does APCu do anything to prevent fragmentation to reach 100%?
  • otherwise (that is, if 100% fragmentation is supposed to be as rare as possible, but not always avoidable), what happens when you reach it? Does APC wipe out the entire cache? Or is it supposed to try and "defragment" the cache a little bit?

And finally: Is 100% fragmentation what causes the "uptime" to restart from zero, meaning this issue and #91 may be related?

teo1978 avatar Nov 10 '15 11:11 teo1978

@teo1978 pro tip: stop acting like an idiot in github issue threads...

Thanks for the tip. I would be immensely grateful if you could give me an additional pro tip by telling me where I acted like an idiot exactly.

teo1978 avatar Nov 10 '15 11:11 teo1978

I don't argue with people online - it's futile and a waste of my valuable time. If you want help: don't expect it, be humble, and above all don't respond like you did. Peace out.

J7mbo avatar Nov 10 '15 11:11 J7mbo

I don't argue with people online - it's futile and a waste of my valuable time.

You don't? I'm confused.

If you want help: don't expect it,

I didn't ask for help.

teo1978 avatar Nov 10 '15 11:11 teo1978

Thanks for the tip. I would be immensely grateful if you could give me an additional pro tip by telling me where I acted like an idiot exactly.

Sorry I meant to say you sound like an idiot or a child.

The mainatiner of the project has tried to repro (without success) and asks for more info + gives some directions. Yet you still feel his attitude is not correct according to your magical standards .

PeeHaa avatar Nov 10 '15 11:11 PeeHaa

In my experience, the behaviour of APCu under high fragmentation depends on the configuration and usage scenario. In our particular case (lots of small objects, a few fairly large objects), we frequently encountered scenarios where trying to apc_store our large objects failed and returned false. Sometimes this would trigger an internal expunge (freeing us up to try again automatically) but sometimes it wouldn't -- so we ended up wrapping and calling apc_clear_cache ourselves when we had confidence the cache was fragmented beyond repair.

From my testing, I'd say that APCu isn't really designed to handle fragmentation. We had good mileage experimenting with various test scripts to understand the limitations and discover workarounds. For example, we observed that APCu compares the apc.ttl setting against access time but the per-entry ttl against creation time. This led to the curious behaviour of randomly evicting recently unused (but still valid) entries whenever a new entry hashed to the same slot as such an entry. We worked around this by setting apc.ttl = 86401, larger than any of our per-entry configured ttls. It was a good exercise overall.

lieut-data avatar Nov 10 '15 14:11 lieut-data

Thanks for the insight.

Did you mean apc.user_ttl or do you really mean apc.ttl??

apc.ttl is supposed to be the old setting for opcode cache (and hence completely unused by apcu) and user_ttl is supposed to be used. That is, if apcu is a drop-in replacement for apc as it claims to be...

teo1978 avatar Nov 10 '15 14:11 teo1978

@lieut-data from your description it seems like apcu is unusable beyond any hope. Did you also find a good alternative or is the panorama of caching for php this bad?

teo1978 avatar Nov 10 '15 14:11 teo1978

I do really mean apc.ttl, and am referring to its usage in the following: https://github.com/krakjoe/apcu/blob/87c0ac6fef9ae4ac85090562db5d9b4f899163db/apc_cache.c#L789

To my knowledge, APCu is still the only option for a in-memory-only solution (modulo Yac, which doesn't really serve our purposes given its limitations). I do think that APCu has value in its current form without handling fragmentation or even locking (as is being discussed in another thread), though we are evaluating other caching options (e.g. the likes of memcache). That being said, we appear to have currently stabilized our usage of APCu by doubling our memory allocation to cover the expected lifetime of the various objects we're persisting -- so the investigation is not a huge priority.

lieut-data avatar Nov 10 '15 15:11 lieut-data

I do really mean apc.ttl

Great to know, thanks. That seems like a (pretty big) bug in itself, either in documentation (if it is intended, and consistent with other settings, for the sake of simplification but meaning apcu is not at all a drop-in replacement for APC for user cache) or in the code.

teo1978 avatar Nov 10 '15 17:11 teo1978

Bump, can you test with 4.0.10 or 5.1.2 ?

krakjoe avatar Dec 07 '15 09:12 krakjoe

Me? Sorry, we switched to Memcached, and since I only observed the issue on a production server, it's unlikely that I'll have the opportunity to test any time soon.

teo1978 avatar Dec 07 '15 22:12 teo1978

I'm seeing the 100% fragmentation too, under Debian stable release, which is 4.0.7. I've set up one segment of 1GB for apcu.

Currently, from the apcu.php status file: Fragmentation: 100.00% (757.4 MBytes out of 757.4 MBytes in 53473 fragments) Free: 757.4 MBytes (74.0%) Used: 266.6 MBytes (26.0%) Hits: 392265123 (99.6%)

No increase in load or the like since I've hit the 100% fragmentation.

I've flushed all the objects with modification_time + ttl < time() (about 2000 objects, no change in fragmentation).

Then, I'm adding new random variables (about 40MB of data):

$string = str_repeat('abcd', 10000);
for ($i = 0; $i < 1000; $i++)
{
    apc_store('delete-me-if-you-want-'.$i.'-'.$_SERVER['REQUEST_TIME'], $string, 3600);
}

Result:

  • free goes from 751 to 713MB
  • hit rate: same as before

To me, there's no impact on the APCu overall performance. So, the question is: how is computed fragmentation? Here's the code from apcu.php, from which I've removed all except the fragmentation ratio:

$fragsize = $freetotal = 0;
for ($i = 0; $i < $mem['num_seg']; $i++)
{
    foreach ($mem['block_lists'][$i] as $block)
    {
        /* Only consider blocks <5M for the fragmentation % */
        if ($block['size'] < (5 * 1024 * 1024))
            $fragsize+=$block['size'];
        $freetotal+=$block['size'];
    }
}
$frag = $fragsize / $freetotal;

Okay... so in fact, the fragmentation is the space used by blocks < 5MB relative to the total used space allocated to all blocks. Not that big a deal, in fact...

As I don't understand much of the C++ code, I've read the TECHNOTES: https://github.com/krakjoe/apcu/blob/master/TECHNOTES.txt#L101 Here we can see that blocks are created along the way, as needed (currently I have 140k variables stored in 54k blocks).

To me, there's nothing wrong with this 100% fragmentation... The issue would be if there's no space to create a block of 5MB for example. So I think it's a bad formula, there must be another way to find out when things will soon go wrong (=APCu clearing all its cache, with is dramatic).

@krakjoe may be we should use the trigger for clearing cache, reverse it, and find a new formula?

Another option to avoid memory fragmentation, is to use the memcached behaviour: create fixed-size blocks of XX MB, then in each block create only one size of variables. For instance, you create a 1MB block, in which you'll store only variables with 16kB size (or little less than that). So there would be 64 variables stored in this first block. See http://www.mikeperham.com/2009/06/22/slabs-pages-chunks-and-memcached/ for example.

I've been a memcached user for a long time, but since my hosting company can have pings over 30ms between all my servers, it's not a keeper... I've move all my cache system to APCu, with invalidation of critical keys through network calls (little home-made PHP invalidation service on dedicated port). This is working great, and response time of APCu is better than memcached as I don't have network calls. Keys are replicated on every server (of course), so it's not distributed as memcached does, but as long as invalid data is moved out of the way, it doesn't matter if it's computed multiple times on various servers.

dugwood avatar Sep 02 '16 12:09 dugwood