apcu icon indicating copy to clipboard operation
apcu copied to clipboard

Feature Request: apcu_fetch that sets TTL, and an apcu_touch

Open mikekasprzak opened this issue 8 years ago • 3 comments

Bit of a feature request. I realize APCu is in a weird funny place of backwards compatibility, but AFAIK it's the only way for PHP to write directly to shared memory w/o having to feed data through network protocols (ala Redis, Memcached, etc). I'm crazy about performance. ;)

I use APCu as a fast cache. It's very common for me to fetch from the database, process the data as something I can use, generate a name, then store a copy in APCu with that name. Then any time I want the data, I can pre-generate the name, check if it's in the "cache", and return it if available (otherwise get it from the DB and repeat).

This works well, but if I want to use a TTL, it'll potentially expire while the cached data is still good. Unfortunately right now, if I want to update the TTL, I have rewrite the entire chunk of data with an apcu_store.

At the very least, I would love to see an apcu_touch command. Essentially, give it a variable, and you can explicitly set the TTL. I would certainly cringe less if that was an option. :)

Ideally though, I would really like to see apcu_fetch, apcu_inc, apcu_dec, and apcu_cas get an optional TTL parameter.

If something like this has already been discussed, my apologies. AFAIK the only documentation for ACPu is the old APC docs.

Thanks.

mikekasprzak avatar Oct 28 '15 15:10 mikekasprzak

This doesn't really relate to what I'm talking about, but I'll humor you anyway.

Sockets are slow. They have a lot of overhead. You need to pack-up your data (effectively serializing it) every time you read/write it to a socket. Without getting too detailed, sockets have connections that need to be negotiated, data stored and piped though those connections, and verified.

APCu writes directly to memory, and that memory is shared across threads. The overhead is negligible by comparison. Most web languages don't have a concept of shared memory, and neither does PHP. Writing robust multithreaded code can be tricky, so most languages don't bother even supporting it. APCu is a workaround. Yes, Code that deals with sockets can scale up better across multiple machines compared to code that deals locally with threads and memory locking. But the thing is, we measure performance of socket connection in milliseconds, and read/writes to memory in cycles (I.e. Clock ticks of the CPU and BUS). The potential performance gains are huge not having to communicate over sockets.

Redis and MySQL are there for a robust data store. MySQLi and the equivalent library for Redis are dumb libraries, in that all they do is talk to the data servers. The data servers may be smart, caching recently used data in RAM instead of having to pull from disk, but that data still needs to be serialized and piped through the socket. With APCu, no communication over a socket is required. Data is immediately available locally. The data is not robust in the same way as Redis or MySQL, but as long as the web server is online, the data is persistent and reliable without any need to negotiate or validate the data (because it's RAM). If the web server ever shuts down, it's no problem. You wouldn't ever write data you expect to be robust to it. You're only ever writing a copy, or things of little consequence if you lost (like session ids, which can be renegotiated).

Now, What I'm talking about is the TTL, I.e. time-to-live of the data I've stored in my shared memory. Internally APCu tracks when data was written/when it should expire, and compares against the current time before returning it. This is very cheap (read system clock, read stored time, subtract, branch on result). What I'm talking about is adding a function to control the TTL, instead of rewriting my data every time. It's faster to set a new expiration time then it is to write several KB of data, which is basically what I store in APCu. My data is fetched from the database, and lightly processed. Much of it doesn't change, so what a waste it is having to send it again over a socket (and reprocess it).

mikekasprzak avatar Jan 04 '16 10:01 mikekasprzak

I'm potentially storing almost anything. The faster I can generate a page, the better it is for my users, and the more users I can server on a machine.

The main things I'm not storing are one-off things, such as search queries, or for security reasons password hashes. I could, but YMMV.

This stuff works today, and thanks to APCu it's super-fast. An example, one of my pages, any time I can "cache hit" my data (pull it from APCu), I can generate a page in roughly 0.3 ms (i.e. 300 μs) on a $10 /mo Linode VPS. Any time it "cache misses" (i.e. needs to hit the database server), it takes about 10 ms. In this case, that's nearly a 100x speedup. I'm looking to optimize my algorithms, by only writing the TTL for every cache hit.

The point of setting TTL is it's algorithmically an easy way to keep things cached that are common, and automatically flagging things for the garbage collector to release (by letting them expire). For example, say there's a page that a dozen users visit over a period of 5 minutes. If my TTL is also 5 minutes, and if I update the TTL every time someone visits the page, the page will still be available beyond the initial 5 minutes. If only 1 user hits the page, and 45 minutes pass, that data will need to be re-fetched/re-generated. That's an easy way of dynamically handling pages getting a lot of traffic.

Another use for TTL, if I'm storing an active Session Id, it makes sense to push-back the expiration every time the user does something (i.e. auto-logging them out if they're inactive).

One more, lets say I want to limit an IP address to 10 failed logins per hour. This is not necessarily something that needs to be stored in the database. If I can deny the attempt to login faster, I can better mitigate a DoS attack. Because I can write to RAM very fast, I can detect if 100+ attempts over a short period of time happen, and take some appropriate action.

For best performance, a load balancer should be routing the same users to the same servers. But even if it doesn't, things that are common and requested by multiple users will be cached across many of your servers anyway, and release once the TTL's expire.

That's the point, to be fast. Nothing in a server is faster than its RAM. What a shame that web development encourages people to pipe everything over sockets, when a local machine could just remember.

mikekasprzak avatar Jan 04 '16 16:01 mikekasprzak

+1 - I got myself obsessed with RAM caching and i try to keep pushing every single millisecond down. (When talking about thousands of requests per second, every millisecond matters).

My example of rarely changing json data from file. I tried to keep it easy to understand when quickly looking through code and as quick as possible (talking about performance).

if (($jsonData = getCache(md5($file))) === false)
	$jsonData = setCache(md5($file), json_decode(file_get_contents(DB . $file), true),300);
echo var_dump($jsonData);

function getCache($CacheKey) {
	return apcu_fetch($CacheKey);
}
function setCache($CacheKey, $CacheVal = '', $CacheLife = 0) {
	apcu_store($CacheKey, $CacheVal, $CacheLife);
	return $CacheVal;
}

Takes only 2 lines and one if clause to always get the value either fresh or cached. I use own getChache/setCache functions, because i've been experimenting with other caching solutions, also setCache function returns the data itself, no matter apcu_store succeeding or failing BTW if you have any suggestion, that would increase the performance, feel free to share. image (little comparison)

A little "bonus":

//DDOS script specific protection per IP
define("VISITOR_IP", getUserIP());
define("TIME_NOW", round(microtime(true) * 1000, 0));
function callFrequency($ms = 100, $action = "") {
	if (($lastVisit = getCache(md5($action . VISITOR_IP))) == false)
		$lastVisit = setCache(md5($action . VISITOR_IP), TIME_NOW,round($ms/1000,0)+1);
	if ($lastVisit == TIME_NOW) return null;
	if ((TIME_NOW - $lastVisit) < $ms) {
		echo "Too many request in short period of time. Wait for : " . round((TIME_NOW - $lastVisit - $ms) / 1000, 2) . "seconds before next attempt";
		exit();
	}
	setCache(md5($action . VISITOR_IP), TIME_NOW);
}
function getUserIP() {
	if (filter_var(@$_SERVER['HTTP_CLIENT_IP'], FILTER_VALIDATE_IP)) return @$_SERVER['HTTP_CLIENT_IP'];
	if (filter_var(@$_SERVER['HTTP_X_FORWARDED_FOR'], FILTER_VALIDATE_IP)) return @$_SERVER['HTTP_X_FORWARDED_FOR'];
	return @$_SERVER['REMOTE_ADDR'];
}

image

Gaspadlo avatar Feb 07 '17 09:02 Gaspadlo