CodeIgniter4 icon indicating copy to clipboard operation
CodeIgniter4 copied to clipboard

Bug: [Session] Redis session lock error

Open kenjis opened this issue 4 years ago • 7 comments

Describe the bug Multiple Ajax communications at the same time sometimes result in session errors. In rare cases, a key is created by another process between the time the $ttl is retrieved and the time the key is set.

Reported: (Japanese) https://groups.google.com/g/codeigniter_jp/c/h65Q7KZtDN4/m/Ef_Vl8DmAAAJ.

CodeIgniter 4 version develop branch (after 4.1.1)

Affected module(s) Session RedisHandler

Expected behavior Use setnx to acquire the lock. If it fails, retry after some time.

Ref https://github.com/bcit-ci/CodeIgniter/issues/5382

kenjis avatar Mar 06 '21 00:03 kenjis

The original reporter used CI3, and indicated the same problem would happen in CI4.

By the way, the current implementation of CI4 is a little different from that of CI3.

Ci3: https://github.com/bcit-ci/CodeIgniter/blob/324628c27ca82e89d5e3a85034127835d29dd9fc/system/libraries/Session/drivers/Session_redis_driver.php#L420-L443

CI4: https://github.com/codeigniter4/CodeIgniter4/blob/902320a282cc99dac735a0741f210d821e05c3f4/system/Session/Handlers/RedisHandler.php#L344-L361

kenjis avatar Mar 06 '21 01:03 kenjis

Improved code (CI3) from the original report:

protected function _get_lock($session_id)
{
   //log_message('error', 'lock' . getmypid() . date(' i:s'));
   // PHP 7 reuses the SessionHandler object on regeneration,
   // so we need to check here if the lock key is for the
   // correct session ID.
   if ($this->_lock_key === $this->_key_prefix.$session_id.':lock')
   {
      return $this->_redis->setTimeout($this->_lock_key, 300);
   }

   // 300 attempts(about 30 second) to obtain a lock, in case another request already has it
   $lock_key = $this->_key_prefix.$session_id.':lock';
   $attempt = 0;
   do
   {
      // Use setnx to perform exclusions with other processes using the lock key. Loop if the lock key cannot be set.
      $result = $this->_redis->set($lock_key, time(), array('nx', 'ex' => 300));
      if ( ! $result)
      {
         usleep(100000);
         continue;
      }
      $result = $this->_redis->setex($lock_key, 300, time());
 
      $this->_lock_key = $lock_key;
      break;
   }
   while (++$attempt < 300);
   if ($attempt === 300)
   {
      log_message('error', 'Session: Unable to obtain lock for '.$this->_key_prefix.$session_id.' after 300 attempts, aborting.');
      return FALSE;
   }

   $this->_lock = TRUE;
   return TRUE;
}

kenjis avatar Mar 06 '21 02:03 kenjis

Is this only an issue with Redis? Why not the other drivers?

If the implementation looks solid feel free to open it as a PR. I haven't ever used Redis but hopefully one of the other maintainers is familiar for a review.

MGatner avatar Mar 06 '21 13:03 MGatner

Is this only an issue with Redis? Why not the other drivers?

I don't know. But it is a race condition. If another driver implements the lock mechanism like that, it would happen everywhere.

kenjis avatar Mar 18 '21 05:03 kenjis

I'm also seeing some session failures with Redis in my prod environment on AWS EC2 + ElasticCache but those happen rarely only from brute force bot requests.

I initially set up Redis on the EC2 where PHP was hosted and sessions were constantly failing no matter what the instance size was. Having redis on ElasticCache improved things are lot but there are still some issues.

When file is being used for the sessions I'm not seeing any issues but I have multiple nodes so file (ramdisk) will not work for me

najdanovicivan avatar Sep 11 '21 07:09 najdanovicivan

I agree that setNx() is the correct way to handle the locking mechanism. I've used it before and I don't believe I've seen issues with multiple requests. I think it's possible that one process can be waiting on the lock (via sleep(1)) while another request can come in, see the ttl is zero, and obtain the lock.

This brings up another concern though - do we need locks in every situation? Seems that enforcing one-at-a-time access to session data can cause performance issues. Here are some things I'd like to consider implementing:

  • Use setNx()
  • More frequent lock checking. Use usleep(10000) instead of sleep(1). Most response times are sub-second (hopefully!), so our session lock checks should be sub-second as well.
  • New config option to turn off session locks. If a user is OK with the risk of multiple processes potentially messing up a session object, let them set a configuration flag to turn off locks. My company, for example, has tons of session reads, very infrequent session writes, so we'd be fine with this. Disabling locks would increase performance.
  • Defer locking until we do a set()? At this point we would reload session data to ensure we have the latest and greatest data, in case some other process changed it, then apply our changes, then hold the lock until we write, then release.
  • Don't attempt to write data unless we did a set()/have a lock set.

mv-steven avatar Jan 07 '22 16:01 mv-steven

ERROR - 15:49:27 10-12-2023 --> Session: Error while trying to free lock for ci_session:ci_session:7aa8677c7de05ba9ad167233d5a2cddfee9e5e2f:lock ERROR - 14:31:21 10-12-2023 --> Session: Unable to obtain lock for ci_session:ci_session:a489f0c94cb43cba56f6c0a97ef79359c3cf9a47 after 30 attempts, aborting. ERROR - 14:31:26 10-12-2023 --> Session: Unable to obtain lock for ci_session:ci_session:a489f0c94cb43cba56f6c0a97ef79359c3cf9a47 after 30 attempts, aborting. ERROR - 15:48:59 10-12-2023 --> Session: Error while trying to free lock for ci_session:ci_session:0f55e709dcae28ebc1b381dd1fe4f7cb07b222f6:lock ERROR - 17:02:48 10-12-2023 --> Session: Error while trying to free lock for ci_session:ci_session:42d84fae09e8967219b133b14e34a3b292307ecb:lock ERROR - 17:02:49 10-12-2023 --> Session: Error while trying to free lock for ci_session:ci_session:42d84fae09e8967219b133b14e34a3b292307ecb:lock ERROR - 18:11:56 10-12-2023 --> Session: Error while trying to free lock for ci_session:ci_session:9e47dfb8f009afa6f7b3e387e3c1a6005134e9d4:lock

How to fix it?

motoroller1983 avatar Dec 11 '23 07:12 motoroller1983

@motoroller1983 Try #8323

kenjis avatar Feb 21 '24 10:02 kenjis

@najdanovicivan Try #8323 if you can. And I found the cause. See https://github.com/codeigniter4/CodeIgniter4/pull/8323#issuecomment-1958678913

kenjis avatar Feb 22 '24 04:02 kenjis

@kenjis I've tried #8323 on production server in high load environment

i'm still getting the errors.

ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> Session: Unable to obtain lock for ci_session:session:maihhj7t9v0if2ibva2j57p4j1q82k6p after 300 attempts, aborting.
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)
ERROR - 2024-02-22 20:38:47 --> session_start(): Failed to read session data: user (path: tcp://sessions.u8ku36.0001.use1.cache.amazonaws.com:6379)

najdanovicivan avatar Feb 22 '24 21:02 najdanovicivan

@najdanovicivan If the error message changes (Session: Error while trying to free lock for -> Session: Unable to obtain lock for), then the lock mechanism itself is working correctly.

If the lock cannot be acquired, the session handler waits for 0.1 second and retries 300 times, but if the lock still cannot be acquired, the error occurs.

I think you need to close the session earlier or extend the time to wait to acquire the lock.

Or is the inability to acquire a lock considered an anomaly that cannot happen? If so, there might still be a bug in the session handler.

kenjis avatar Feb 22 '24 21:02 kenjis

@kenjis I did some more investigating and the Handler works property

The issue was happening due to not traffic requests. The bot was sending 100 requests at the same time. So i just had to make some tweaks to Throttler config.

najdanovicivan avatar Feb 22 '24 22:02 najdanovicivan

@motoroller1983 while you waiting for release you can create custom handler class.

Just copy the contents of https://github.com/kenjis/CodeIgniter4/blob/fix-redis-session/system/Session/Handlers/RedisHandler.php somewhere in App namespace for example /App/Session/Handlers/RedisHandler

You then need to change 2 thinks in code change namespace from namespace CodeIgniter\Session\Handlers; to namespace App\Session\Handlers; then add use statement for parent class use CodeIgniter\Session\Handlers\BaseHandler;

Once that is done you have your own custom handler you just need to update Config\Session to use it. I do this in .env file session.driver = 'App\Session\Handlers\RedisHandler'

najdanovicivan avatar Feb 23 '24 07:02 najdanovicivan