pecl-system-sync
pecl-system-sync copied to clipboard
PHP SyncReaderWriter Issues
Hello,
I'm using PHP SyncReaderWriter model for cache synchronization that is, the Alghoritm:
- Write Lock:
$sync = new \SyncReaderWriter( $lockKey );
$result = $sync->readlock( $this->lockTimeoutMilliseconds );
-
Check if value exists in the cache and is not expired
-
If it exists and it's valid, unlock and return value:
if ($result) { $sync->readunlock(); }
return $value;
- If it not exists and it's not valid unlock readlock, lock writelock:
$sync->readunlock();
$sync = new \SyncReaderWriter( $lockKey );
$result = $sync->writelock( $this->lockTimeoutMilliseconds );
- Check if value has been created in meantime by other process, if it is so, unlock writelock, and return value
if ($result) { $sync->writeunlock(); }
return $value;
- If it's still not created, create value, put into cache, unlock writelock, return value:
if ($result) { $sync->writeunlock(); }
return $value;
Problem is, sometimes, it looks like a lock is not released and hangs forever until I reboot linux machine, restarting php-fpm doesn't unlocking it.
There are no fatal errors noticed in PHP error log.
There is one thing maybe I'm doing wrong:
If the lock fails, like after 5 sec of waiting, I'm not releasing the lock (Steps 3, 5 or 6) but just getting value from the cache and returning it, or trying to writelock nad create the value.
I'm releasing the lock only if acquired:
if ($result) { $sync->writeunlock(); }
Should I always release it even if acquiring failed:
$sync->writeunlock(); // Releasing the lock even if acquiring failed? ($result != true)
Should I always try to release the lock even if I failed to acquire it, or it's just something else here?
I'm running pretty busy system/website where there are many visitors and tens of thousands items in the cache, so synchronisation is heavly used, but generates a lot of hanging locks, so finally i had to switch synchronisation off in favor of cache slamming until I find solution to the problem, or other way to synchronise it.
I don't see an obvious problem with the approach. That's the whole point of reader/writer locks. Many readers can be accessing at one time. When a writer needs to write, new readers are locked out until the writer obtains and releases the lock. Then readers can obtain locks again (i.e. so a writer doesn't starve). If you read a value without a lock, you could read a bad value because another process could be writing to that section of memory at the time of the read. In general, you want to hold a lock for as little time as necessary.
All locks should be released by PHP itself automatically (i.e. you don't need to do that) when the PHP script stops running with the sole exception of PHP itself segfaulting or the process is force killed by the OS for some reason. You don't need to reboot the whole system to clean up shared memory objects that house the locks. On Linux, shared memory objects are stored in /dev/shm and can be deleted by the root user.
I believe that PHP may segfault a lot more often or be killed off by the OS more frequently than anyone realizes and the problem only becomes noticeable when using extensions like sync. But replicating/tracing the behavior consistently is likely to be extremely difficult.
It's certainly also possible that I missed something in the sync codebase. Reader/writers are a combination of event, semaphore, and mutex objects.
I will try to investigate it. My php-fpm process daemon is set up to terminate php processes after they reach certain amount of served requests, that is in the /etc/php/7.4/fpm/pool.d/www.conf config file it is set to 500 req:
; The number of requests each child process should execute before respawning.
; This can be useful to work around memory leaks in 3rd party libraries. For
; endless request processing specify '0'. Equivalent to PHP_FCGI_MAX_REQUESTS.
; Default Value: 0
pm.max_requests = 500
Maybe this is the case(?)
Anyway if I explicitly (by hand) unlock the hanging lock in the specially prepared PHP script by invoking writeunlock() with specified key, it is unlocked. It just seems it sometimes doesnt unlock it - but I don't know at which stage - when I'm unlocking it by readunlock() / writeunlock(), or when script terminates due to php error, or when its terminated by php-fpm.
Edit: If you have any way to improve debuggin process of this case please hint me, maybe I can do something in my php scripts to better understand what's going on?
Edit 2, another clue: could it be caused by reloadin php-fpm(?) by command:
systemctl reload php7.4-fpm.service
I sometimes make changes in the php code and then need to realod fpm because of using op-cache.
Anyway thx for tip about /dev/shm, I'm not a Linux expert and it might be useful for debugging the error and find exact situation where it happens, and then try to reproduce it.
If I manage to reproduce the error, and get consistent results I'll let you know.
Max requests should have nothing to do with it. Those are clean startups/shutdowns of PHP core. The max requests option is to deal with memory leaks in some extensions/libraries.
Being able to unlock a lock held by another process sounds quite suspicious to me.
There are two ways to debug locking code. Make one or two scripts that either: Run really slowly (i.e. throw in a bunch of sleep statements to precisely control which script does what and when) or run really fast in a loop (stress test). Then run the script(s) in a couple of terminal sessions until you've determined that the code either works or enters a failure/error state (e.g. deadlock).
Reload usually sends SIGHUP to a process. I'm not sure how PHP handles that signal in php-fpm. It should wait for all processes to exit cleanly and then restart them.
Yeah but the script that is causing the hanging write lock is a script that is executed by HTTP request, not in CLI mode, and is terminated one way or another - by normal ending of request or some kind of error / premature end of script, but like I've said I didn't noticed any fatal errors in the PHP error log.
After that when I run specially prepared script that is doing only one thing - writeunlock() on key that is hanging up, by executing it from web browser, it is unlocked.
So it's not like the script that caused the lock is hanging up - even if it was, there are limits for execution time in php itself when it is in HTTP mode, so it's terminated like after 120 sec. while the lock hangs "forever".
Like I've said i will try to investigate it and reproduce the error, I'll let you know when it happens, I'm planning to do it in the incoming weekend.