fatfree icon indicating copy to clipboard operation
fatfree copied to clipboard

Cache Configs?

Open richgoldmd opened this issue 9 years ago • 30 comments

I like the idea of using config files, especially for routes when they are very long. I also like the idea of using opcode caching like APC, etc.

When I include my routes in the PHP file, they are cached in the opcode cache, however, when I keep them in a config file, there is a file read for every hit.

Since these change infrequently, does it not make sense to provide the option to cache configs? I have some code I am testing in my fork https://github.com/richgoldmd/fatfree-core/commit/fd3315936330530ba55be1fcf1562ef1c8bb286a

basically:

function config($file,$allow=FALSE, $cache_ttl=0) {
     $cache = Cache::instance();
     if (!$cache_ttl || !$cache->exists($hash=$this->hash($file),$matches)) {
           preg_match_all(
                     '/(?<=^|\n)(?:'.
                            '\[(?<section>.+?)\]|'.
                            '(?<lval>[^\h\r\n;].*?)\h*=\h*'.
                            '(?<rval>(?:\\\\\h*\r?\n|.+?)*)'.
                    ')(?=\r?\n|$)/',
                    $this->read($file),
                    $matches,PREG_SET_ORDER);
     }
     if ($matches) {
         if ($cache_ttl) {
              $cache->set($hash, $matches, $cache_ttl);
         }
             .....

richgoldmd avatar Jun 17 '15 11:06 richgoldmd

:+1:

xfra35 avatar Jun 17 '15 12:06 xfra35

As for the implementation, I'm not sure if the ttl makes sense. We don't care much if the file gets cached for 1h or 1 week, but we do care that it gets refreshed as soon as we change one parameter.

So maybe something with filemtime like in Preview::render?

xfra35 avatar Jun 17 '15 12:06 xfra35

Also config files that contain dynamic tokens (cf. the $allow parameter) can't be cached.

I think we could adopt the same strategy as in Preview::render: to compile config files into plain PHP files. Once a file would be compiled, we would just check if filemtime didn't change and simply require it.

xfra35 avatar Jun 17 '15 12:06 xfra35

I think that makes sense but if the caching mechanism is the filesystem I think there is no real performance gain (Since the parsing is trivial and the bulk of the cycles are spent reading the file, IMO). So unlike Preview::render - I would skip the whole thing if there is no caching and maybe also if the cache mechanism is the filesystem.

richgoldmd avatar Jun 17 '15 15:06 richgoldmd

I think for this to be meaningful, we would cache the last filemtime along with $matches and compare that to the filemtime of the actual config file.

Caching matches still works because dynamic tokens can still be resolved. Converting the whole process to a runnable PHP file seems like overkill.

It remains to be seen if the cache-check/cache-load + stat on a file is more efficient that simply reading the file.

Thoughts?

richgoldmd avatar Jun 17 '15 21:06 richgoldmd

@richgoldmd if you cache only $matches, you just skip the preg_match + file read, which is not the most time consuming. I just made a quick test with the definition of 50 variables + 50 routes:

  • ini file: ~6 ms
  • php file: ~3 ms
  • ini file with cached matches: ~6 ms

Actually I'm pretty sure that the most time-consuming task is the call to $f3->route. I made another test, where ROUTES is cached and restored using a simple $f3->ROUTES+=$cached['ROUTES'] and I get an average of 0.3 ms for the same test as above!

@stehlo you're right. The only thing is that it's safer to provide an ini file to a non-that-techie client than giving him index.php.

xfra35 avatar Jun 18 '15 08:06 xfra35

@xfra35 thanks for running the timings. What cache engine are you using in the test?

richgoldmd avatar Jun 18 '15 09:06 richgoldmd

Are you using an opcode cache as well?

richgoldmd avatar Jun 18 '15 09:06 richgoldmd

@richgoldmd APC and yes, it has an opcode cache. This was a just a quick test to get a rough order of magnitude.

xfra35 avatar Jun 18 '15 09:06 xfra35

Well, then @xfra35, you are correct. @stehlo is also correct.

It seems the best compromise if there is a concern about performance at this point (in the absence of rendering out a PHP file that handles the routes as well, which is likely not worth the effort) is to keep the configs, especially the routes, in a PHP file. Opcode caching can be leveraged as well by putting the details in a separate, explicitly required file. This would likely achieve the same goal of keeping the bootstrap file clean, making configs/routes easy to maintain, and leveraging caching to avoid repetitive file reads.

Thanks to you both for the input.

-Rich

richgoldmd avatar Jun 18 '15 18:06 richgoldmd

Hey guys, before we close the topic, I've made a quick benchmark to find out what could significantly speed up the configuration phase.

In short, I've compared the loading times of:

  • 1 config file without caching
  • 2 smaller config files (globals + routes) with caching of routes (easier to cache than globals)
  • 1 config file with caching (routes + globals)

Each test is performed twice: the first one against an .ini file, and the second one against a .php file. Config data contains 50 vars + 50 routes.

Here are the results (averages of 20 runs):

caseformatAPC + op. cacheFile cache
1 file not cached.ini6.3 ms6.4 ms
1 file not cached.php4.6 ms4.7 ms
1 file cached + 1 not cached.ini5.3 ms5.2 ms
1 file cached + 1 not cached.php3.4 ms4.0 ms
1 file cached.ini1.2 ms0.7 ms
1 file cached.php1.1 ms0.7 ms

Looks like the initial suggestion of caching config files makes totally sense.

xfra35 avatar Jun 18 '15 22:06 xfra35

@xfra35 Can you share how you cached the configs? I imagine from our discussion you cached more than just $matches?

richgoldmd avatar Jun 19 '15 01:06 richgoldmd

Didn't see the paste link. Got it!

richgoldmd avatar Jun 19 '15 01:06 richgoldmd

Ok that's interesting. To be clear "File Cache" is caching the files in APC when you set $f3->CACHE=TRUE - the AP + op-cache column has $f3->CACHE=FALSE so no caching is happening, just opcode caching.

richgoldmd avatar Jun 19 '15 01:06 richgoldmd

Sorry I should have clarified the 2 columns:

  • APC + op. cache: test run on a VPS with APC + opcode cache enabled
  • File cache: test run locally with no in-memory cache (thus file cache is used)

xfra35 avatar Jun 19 '15 06:06 xfra35

So that were comparing apples-to-apples, Can you run it on your VPS with $f3->CACHE=FALSE and also the the folder dsn to force file caching?

richgoldmd avatar Jun 19 '15 11:06 richgoldmd

Why do you say so? We are comparing standard config() command (which doesn't cache anything even when CACHE=TRUE) to semi and full cache-aware config() in a cached environment.

The reason why I've run a 2nd test on a local machine after the VPS test was to make sure that the results were not biased by the opcode cache. And also to check if the performance gain would be as obvious with a file cache. It turns out to be similar.

What would a test with CACHE=FALSE bring?

xfra35 avatar Jun 19 '15 11:06 xfra35

was it 6ms without any caching at all (current way of v3.5.0)?

ikkez avatar Jun 19 '15 12:06 ikkez

You're correct, CACHE=FALSE is redundant. (@ikkez - case 1 is the default framework mechanism). However I am concerned that your local machine with file caching performed better than your VPS with OPcode + memory caching - it would be useful to compare it on the same server with file-based caching.

richgoldmd avatar Jun 19 '15 13:06 richgoldmd

I am concerned that your local machine with file caching performed better than your VPS with OPcode + memory caching

That surprised me too ^^. On another hand, that's the cheapest VPS I could find. I'll try on a faster one when I have time. You're free to run the script on your side too.

@ikkez without caching of config settings (but with CACHE=TRUE)

xfra35 avatar Jun 19 '15 13:06 xfra35

Is the ini file you used something you can share?

On Fri, Jun 19, 2015, 9:24 AM Florent [email protected] wrote:

I am concerned that your local machine with file caching performed better than your VPS with OPcode + memory caching

That surprised me too ^^. On another hand, that's the cheapest VPS I could find. I'll try on a faster one when I have time. You're free to run the script on your side too.

@ikkez https://github.com/ikkez without caching of config settings (but with CACHE=TRUE)

— Reply to this email directly or view it on GitHub https://github.com/bcosca/fatfree/issues/845#issuecomment-113514535.

richgoldmd avatar Jun 19 '15 13:06 richgoldmd

Sure: here it is.

xfra35 avatar Jun 19 '15 13:06 xfra35

I just love this hair-splitting exercise, and I am elated that the community is still looking for ways to make this speedy framework faster - things that I never got (nor bothered) to refine further. @xfra35 is right. Routes take up the bulk of the parsing time. Perhaps an array_diff_assoc() on $this->hive['ROUTES'] within the config() method can help?

bcosca avatar Jun 19 '15 15:06 bcosca

More hair-spitting!

Here is data from a M3.Large server on AWS with locally installed redis vs. file cacheing

Mode FileCache Redis FileCache Redis
1 4.58 7.29
2 3.04 5.75
3 3.64 6.57 3.44 6.11
4 2.65 5.48 2.39 5.05
5 0.73 0.80 0.49 0.43
6 0.65 1.08 0.40 0.43

(Values are msec). APC opcache active. Caching where used is with redis.

The last two columns are with timing starting after the cache value is set so discounts the overhead of connecting to the cache (which needs to be distributed over all caching operations, not just configs). Interesting that the cache setup overhead is more than the file access.

richgoldmd avatar Jun 19 '15 16:06 richgoldmd

After looking more into it, it turns out that most of the time is spent in $f3->set(). And therefore the big gap observed between 3-4 and 5-6 in the previous test is mostly due to the fact that we replace 50 calls to $f3->set() by 2.

I've made another test where we keep the 50 calls and the performance gain is smaller:

caseformatVPS (APC)Local (File)
1 file not cached.ini5.8 ms5.7 ms
1 file not cached.php3.5 ms4.4 ms
1 file cached + 1 not cached.ini5.2 ms4.6 ms
1 file cached + 1 not cached.php2.9 ms3.6 ms
1 file cached.ini2.4 ms3.6 ms
1 file cached.php2.4 ms3.6 ms

Also I've added 2 more cases to compare $f3->set() and $f3->route():

caseVPSLocal
50x set2.2 ms3.2 ms
50x route1.2 ms0.8 ms

Looks like if you want to speed up a bit things, you should first have a look at the set() method.

xfra35 avatar Jun 19 '15 21:06 xfra35

When cache is enabled, this line takes most of the time spent in set(). If we skip the $cache->exists, we get:

caseformatVPS (APC)Local (File)
1 file not cached.ini5.6 ms3.3 ms
1 file not cached.php2.6 ms2.1 ms
1 file cached + 1 not cached.ini4.1 ms2.2 ms
1 file cached + 1 not cached.php2.2 ms1.4 ms
1 file cached.ini1.9 ms1.2 ms
1 file cached.php1.7 ms1.2 ms
caseVPSLocal
50x set1.6 ms1.0 ms
50x route1.0 ms0.8 ms

Please draw some conclusions.. I'm a bit confused about what we're trying to solve here ^^

xfra35 avatar Jun 19 '15 22:06 xfra35

The original question was whether there is a performance hit when using configs because the contents don't benefit from the opcode cache and instead have another fie read. I think we've demonstrated that that is true, but that the caching of configs is complicated, and much of the performance is in the parsing - moreso than in the retrieval of the configs from the file or the cache.

I think the horse is dead. To properly realize the performance gains, the routes and globals need to be handled separately and the implementation is application specific (i.e. Are there multiple route config files? Are the globals prefixed? what about maps?)

For my own part, I think I'll separate route configs, and cache routes per @xfra35 prototype and @bcosca;s suggestion regarding array_diff_asssoc() in the bootstrap code.

Thanks for weighing in.

richgoldmd avatar Jun 20 '15 03:06 richgoldmd

Why don't we simply cache the whole HIVE (or its changes) after the config was parsed and processed, and restore that var from cache as long as the config file's modified time has not changed. That reduces all config actions to one set call.

ikkez avatar Jun 20 '15 07:06 ikkez

just as a little follow up:

xfra35: When cache is enabled, this line takes most of the time spent in set()

Indeed, this has changed in v3.6 now, so maybe the issue isn't that big anymore.

ikkez avatar Jan 09 '17 21:01 ikkez

So @ikkez asked if I was bored to look into this. I came up with a slightly different solution that did have some good impact on performance, even in a small config file. I think there's still some things to improve, but the bulk of it is here.

// updated config method
   /**
    *   Configure framework according to .ini-style file settings;
    *   If optional 2nd arg is provided, template strings are interpreted
    *   @return object
    *   @param $source string|array
    *   @param $allow bool
    **/
    function config($source,$allow=FALSE,$config_ttl=0) {
        if (is_string($source))
            $source=$this->split($source);
        if ($allow)
            $preview=Preview::instance();
        $is_caching_enabled = $config_ttl !== 0;
        $has_routes = false;
        foreach ($source as $file) {
 
// pretty much this if statement
            if($is_caching_enabled) {
                $Cache = Cache::instance();
                $config_array = [];
                // other cache keys could be implemented to account for unique template vars if $allow = true
                $cache_key = $this->hash($file).'.ini';
                $Cache->exists($cache_key, $config_array);
                if(!empty($config_array)) {
                    $this->mset($config_array);
                    continue;
                }
            }
 
            preg_match_all(
                '/(?<=^|\n)(?:'.
                    '\[(?<section>.+?)\]|'.
                    '(?<lval>[^\h\r\n;].*?)\h*=\h*'.
                    '(?<rval>(?:\\\\\h*\r?\n|.+?)*)'.
                ')(?=\r?\n|$)/',
                $this->read($file),
                $matches,PREG_SET_ORDER);
 
            $lvals = [];
            if ($matches) {
                $sec='globals';
                $cmd=[];
                foreach ($matches as $match) {
                    if ($match['section']) {
                        $sec=$match['section'];
                        if (preg_match(
                            '/^(?!(?:global|config|route|map|redirect)s\b)'.
                            '(.*?)(?:\s*[:>])/i',$sec,$msec) &&
                            !$this->exists($msec[1]))
                            $this->set($msec[1],NULL);
                        preg_match('/^(config|route|map|redirect)s\b|'.
                            '^(.+?)\s*\>\s*(.*)/i',$sec,$cmd);
                        continue;
                    }
                    if ($allow)
                        foreach (['lval','rval'] as $ndx)
                            $match[$ndx]=$preview->
                                resolve($match[$ndx],NULL,0,FALSE,FALSE);
                    if (!empty($cmd)) {
                        isset($cmd[3])?
                        $this->call($cmd[3],
                            [$match['lval'],$match['rval'],$cmd[2]]):
                        call_user_func_array(
                            [$this,$cmd[1]],
                            array_merge([$match['lval']],
                                str_getcsv($cmd[1]=='config'?
                                $this->cast($match['rval']):
                                    $match['rval']))
                        );
// and this one
                        if($is_caching_enabled && $cmd[0] === 'routes') {
                            $has_routes = true;
                        }
                    }
                    else {
                        $rval=preg_replace(
                            '/\\\\\h*(\r?\n)/','\1',$match['rval']);
                        $ttl=NULL;
                        if (preg_match('/^(.+)\|\h*(\d+)$/',$rval,$tmp)) {
                            array_shift($tmp);
                            list($rval,$ttl)=$tmp;
                        }
                        $args=array_map(
                            function($val) {
                                $val=$this->cast($val);
                                if (is_string($val))
                                    $val=strlen($val)?
                                        preg_replace('/\\\\"/','"',$val):
                                        NULL;
                                return $val;
                            },
                            // Mark quoted strings with 0x00 whitespace
                            str_getcsv(preg_replace(
                                '/(?<!\\\\)(")(.*?)\1/',
                                "\\1\x00\\2\\1",trim($rval)))
                        );
                        preg_match('/^(?<section>[^:]+)(?:\:(?<func>.+))?/',
                            $sec,$parts);
                        $func=isset($parts['func'])?$parts['func']:NULL;
                        $custom=(strtolower($parts['section'])!='globals');
                        if ($func)
                            $args=[$this->call($func,$args)];
                        if (count($args)>1)
                            $args=[$args];
                        if (isset($ttl))
                            $args=array_merge($args,[$ttl]);
                        call_user_func_array(
                            [$this,'set'],
                            array_merge(
                                [
                                    ($custom?($parts['section'].'.'):'').
                                    $match['lval']
                                ],
                                $args
                            )
                        );
                        if($is_caching_enabled) {
                            $lvals[] = $match['lval'];
                        }
                    }
                }
            }
 
// and this one
            if($is_caching_enabled) {
                $config_array = $this->hive();
                $config_array = array_intersect_key($config_array, array_flip($lvals));
                if($has_routes) {
                    $config_array['ROUTES'] = $this->get('ROUTES');
                }
                $Cache->set($cache_key, $config_array, $config_ttl);
            }
        }
        return $this;
    }

This is the example config file I had:

[globals]
CACHE = true
UI = ui/
TEMP = tmp/
ENVIRONMENT = DEVELOPMENT
DEBUG = {{ @ENVIRONMENT === 'DEVELOPMENT' ? 3 : 0}}
LOG = log/

[routes]
GET / = Test_Controller->testEndpoint

One hiccup I did run into was before the $f3->config() was called the CACHE hive had to be set. I'm sure we could easily work around this though.

All results were run with ab -n 10000 -c 500 http://localhost:8000/

The results with no caching enabled: https://fpaste.me/aH3bAIVf6K

Percentage of the requests served within a certain time (ms)
50% 926
66% 944
75% 952
80% 957
90% 965
95% 980
98% 1002
99% 1006
100% 1009 (longest request)

The results with caching enabled: https://fpaste.me/U2CO5VfLAM

Percentage of the requests served within a certain time (ms)
50% 519
66% 529
75% 539
80% 548
90% 556
95% 590
98% 633
99% 652
100% 656 (longest request)

The results with caching and igbinary enabled: https://fpaste.me/HjuqnAoMd6

Percentage of the requests served within a certain time (ms)
50% 500
66% 507
75% 517
80% 522
90% 528
95% 544
98% 559
99% 561
100% 565 (longest request)

Like others have mentioned (and what I've tested on my own) cache the routes brings the biggest performance gain. https://fpaste.me/GjoLv0hzTA Shows how much extra processing time goes into just route processing.

From Slack Convo Aug 13th:

Very similar results if I change it from a closure to a Class->method route If I take the routes and simply at the end just cache $fw->ROUTES and then comment out all the routes and then call back the cache for $fw->ROUTES I get a nice little boost from about 350-400ms average request to 230-270ms average request. I get very similar results if I don't use the Cache class, but instead just use file_get_contents and file_put_contents of $fw->ROUTES being serialized. I bet it'd be a little better with igbinary or msgpack Fancy.....with igbinary, that drops down to about 185-200ms per request. So with just a couple tweaks, you can cut the request time in half.

Test File

n0nag0n avatar Dec 03 '20 23:12 n0nag0n