Redis->auth() sometimes freezes after upgrade to php 7.4
Hi,
After upgrading to PHP 7.4 has been appeared a problem with random auth() freezing. The same script executed with php 7.3.15 (with same phpredis version) works fine. It seems that execution is stopped until server-side TCP keepalive timeout reached. As there are the same phpredis versions, I'm not sure that it is a phpredis issue. But maybe somebody can help in future investigation. But it seems strange that read timeout option does not affects the auth call.
Expected behaviour
auth() should throw read timeout exception after 5 seconds
Actual behaviour
execution stopped for 300 seconds until TCP keepalive timeout reached
I'm seeing this behaviour on
- OS: Ubuntu 18.04.4 LTS
- Redis: 5.0.6 (digitalocean managed instance)
- PHP: 7.4.3
- phpredis: 5.2.0
Steps to reproduce, backtrace or example script
example script:
<?php
function l($message)
{
global $current;
echo $message . ' (' . (microtime(true) - $current) . ')' . PHP_EOL;
$current = microtime(true);
}
ini_set('default_socket_timeout', 5);
$start = microtime(true);
$current = microtime(true);
l('start');
$redis = new Redis();
if (!$redis->connect('tls://redis-instance', 25061, 5, null, 5, 5)) {
throw new RuntimeException($redis->getLastError());
}
l('connected');
$redis->setOption(Redis::OPT_READ_TIMEOUT, 5);
l('option is set');
$redis->auth('some-password');
l('auth complete');
$redis->select(4);
l('db selected');
echo 'Total: ' . (microtime(true) - $start) . PHP_EOL;
occasionally this script hangs up for 300 seconds:
user:~$ while true; do php redis.php; done
start (5.9604644775391E-6)
connected (0.017167091369629)
option is set (5.0067901611328E-6)
auth complete (0.00058794021606445)
db selected (0.00032281875610352)
Total: 0.018187999725342
start (4.0531158447266E-6)
connected (0.012931108474731)
option is set (7.1525573730469E-6)
auth complete (0.00078892707824707)
db selected (0.00040698051452637)
Total: 0.014256000518799
start (4.0531158447266E-6)
connected (0.01163911819458)
option is set (1.0967254638672E-5)
auth complete (300.07370495796)
db selected (0.00037598609924316)
Total: 300.08729100227
strace log:
write(3, "\27\3\3\0006[O\21\240\275|H\243\356\344\16\277\215\243\362h;\350*<N@\24\25\225\254["..., 59) = 59
fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
read(3, "\27\3\3\0\26", 5) = 5
read(3, "\262\341\277k3\317O\3143X\232\313\310\2742\246\v\203\341\332T\261", 22) = 22
fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR) = 0
write(1, "2020-03-17 10:49:05 auth complet"..., 522020-03-17 10:49:05 auth complete (300.79558706284)
) = 52
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
write(3, "\27\3\3\0(\245\360\223\217\252\375p\311\324\246\323\1\303\243\360\343\302\2:\214\5'\"\224\32\2500"..., 45) = 45
fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR) = 0
poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 0 (Timeout)
fcntl(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
read(3, "\27\3\3\0\26", 5) = 5
read(3, "\327n\353\321\367\354\317[\274&\316)\237C\304\305\275\343\4\266\334\10", 22) = 22
fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR) = 0
write(1, "2020-03-17 10:49:05 db selected "..., 542020-03-17 10:49:05 db selected (0.00063395500183105)
I've checked
- [x] There is no similar issue from other users
- [x] Issue isn't fixed in
developbranch
Interesting, thanks for the report. I'll try to replicate the issue locally.
I just created fresh droplet + managed redis on digitalocean
# add-apt-repository ppa:ondrej/php
# apt-get update
# apt-get install php-redis php7.3-cli php7.4-cli
With all default settings the issue is easy reproducible:
# while true; do php7.3 ~/redis.php; done
vs
# while true; do php7.4 ~/redis.php; done
I can provide access if it helps.
Small update: issue can be reproduced only with TLS connection.
I can confirm this identical issue on PHP 7.4.4 + php-redis 5.2.1, DigitalOcean managed Redis + TLS.
This issue does not occur when connecting from the same droplet(s) to the same managed Redis instance on PHP 7.3.16 + php-redis 5.2.1 or older PHP 7.2.27 + php-redis 4.2.0 or same PHP 7.4.4 + predis (Laravel).
I've thrown together this attempt at a full-stack repro, but I can't seem to trigger it locally (only with managed Redis): https://github.com/AlbinoDrought/php-redis-7.4-tls-freeze-repro
I have discovered this same issue independently. I can confirm that I see the issue only with TLS connections to redis.
I have experimented with the latest redis (6) which has TLS support built in, as well as redis (5) which has no TLS support (so I placed nginx with TLS in front of redis). Always I am seeing timeouts occurring with TLS connections. Sometimes frequent, such as 1 out of 5 connections times out. Sometimes all connections timeout. Always though, while a TLS timeout is occurring, the non-TLS redis connections can still get through. This leads me to believe that redis is actually not the issue, but rather the issue is somewhere in the TLS connection itself.
I am using/testing primarily on php 7.4.4-fpm on Alpine Linux 3.11 I also tested using the same version of php but on Debian Linux (10/Buster)... same problem occurred.
I am executing in containers on Docker in 3 environments:
- Compose: single node, non-clustered
- Swarm Local: single node, clustered
- Swarm Cloud: multi-node, clustered And I see the same problem in all cases
After further testing on different versions of php-fpm:
I find the problem exists on versions: 7.4.5-fpm 7.4.4-fpm 7.4.3-fpm 7.4.2-fpm 7.4.1-fpm 7.4.0-fpm
I find the problem does not exist on version: 7.3.17-fpm
From the PHP changelog (https://www.php.net/ChangeLog-7.php) I can see that there were several updates made to OpenSSL in version 7.4.0, which is the first broken version after 7.3.17 which is the last working version.
I am not going to look into the source code on this one, but if I had to guess a likely culprit from the changelog, it would probably be one of:
- Added TLS 1.3 support to streams including new tlsv1.3 stream.
- Added openssl_x509_verify function.
Some searching around on the PHP bug tracking website (https://bugs.php.net) turns up an interesting build bug for the 7.4 branch (https://bugs.php.net/bug.php?id=78345) which suggests that several OpenSSL tests are being ignored (because they are failing). That sounds like it would leave an opportunity for an OpenSSL-related bug to propagate into the 7.4 branch.
I will file a bug on their site, and reference this thread, and then I will update this thread to reference the bug on their site. Ok, I filed it at: https://bugs.php.net/bug.php?id=79501 (at the moment it seems to be a "private" link so perhaps the bug report is being evaluated before the link can be viewed by anyone)
I have some information from the good people at bugs.php.net
[email protected] informed me: "One relevant difference that comes to mind is that PHP 7.4 will negotiate TLS 1.3 by default. Do you know which TLS version actually gets used?"
So, question... Do we know which TLS version is involved in the connections that freeze vs. the connections that don't freeze?
@inieves could you try to use tlsv1.2:// when connecting to the Redis server?
I have verified that TLS 1.2 connections work consistently, while TLS 1.3 connections freeze (fail to establish) on PHP 7.4.5-fpm.
On the other side of the testing, both TLS 1.2 and TLS 1.3 connections work consistently on PHP 7.3.17-fpm.
To perform this test I put nginx in front of my Redis, disabled TLS on Redis entirely, and enabled TLS on nginx. I varied the TLS settings on nginx as follows: TLS 1.2 only, then TLS 1.3 only, then both TLS 1.2 and TLS 1.3.
Interesting to note, when nginx has:
- only TLS 1.2, the connections always work
- only TLS 1.3, the connections always freeze
- both TLS 1.2 and TLS 1.3, the connections work for the first 5 attempts, and then one freezes. If I attempt another connection then the next 4 or 5 connections succeed and then another one freezes. And this cycle repeats.
@yatsukhnenko I have verified that tlsv1.2:// results in consistent working connections. Also, tlsv1.3:// results in some connections that work and most other connections that fail.
It seems that when I constrain phpredis to TLS 1.3 I get a noticeably higher percentage of frozen connections than if I constrain nginx to TLS 1.3.
I can also say the cipher suite being used in all my TLS 1.3 connections (consistent and freezing): TLS_AES_256_GCM_SHA384
@shurastik @AlbinoDrought @inieves can you try to rewrite your test using php streams? This will help PHP-team to identify the source of the problem because right now it looks like something may be wrong in PhpRedis even if it just wraps php streams.
@yatsukhnenko Is there a difference between phpredis on PHP 7.4.5-fpm as compared to PHP 7.3.17-fpm? Because that is the only change that I made in my testing, and that alone was enough to make a difference for TLS 1.3 connection freeze/latency.
I did find some evidence that TLS 1.3 has some fundamental differences compared to TLS 1.2, in terms of the protocol itself. I learned that these differences are not accounted for on many usages of TLS 1.3, and this can cause connections to hang:
- https://github.com/openssl/openssl/blob/6e94b5aecd619afd25e3dc25902952b1b3194edf/CHANGES#L236
- https://github.com/openssl/openssl/issues/7327
- https://wiki.openssl.org/index.php/TLS1.3#Non-application_data_records (see Non-application data records section at bottom)
- https://bugs.openjdk.java.net/browse/JDK-8208526 (here you can read about a potentially similar effect as we are seeing, but in a Java application)
@shurastik @AlbinoDrought @yatsukhnenko I do not have any reasonable way to re-write the test using streams, I did not write the original test. My own testing was based purely on phpredis/nginx/redis.
I'm going to close this issue because it is actually not a bug in phpredis.
@yatsukhnenko do you believe this is most likely a bug in PHP itself (or perhaps OpenSSL)?
Any update on this? Running into the same issue. Especially noticeable when the server has a timeout set (closing idle connections), phpredis gets stuck on any command after the idle time until the whole client timeout is reached and then errors out with Timed out attempting to find data in the correct node!.
I am using a Redis Cluster 6.0.x with the latest phpredis extension on php 7.4 with tls scheme and connection pool enabled.
Glad this thread exists! This was our problem exactly!
We're using DigitalOcean-managed Redis, php7.4.3, and Ubuntu 20.04.3.
Switching the connection string from using tls:// to tlsv1.2:// solved the issue for us!
Glad this thread exists! This was our problem exactly!
We're using DigitalOcean-managed Redis, php7.4.3, and Ubuntu 20.04.3.
Switching the connection string from using
tls://totlsv1.2://solved the issue for us!
Same issue with AWS Redis. Same solution helped. But after investigating further its a general issue from php, there is even a bugreport.
I know that switching to tlsv1.2 works but I was trying to debug this issue and try to find the root cause and I found this issue: https://github.com/ruby/openssl/issues/449
They are also mentionning issue with tls 1.3 and the cipher TLS_AES_256_GCM_SHA384
So I guess the error doesn't come from either PHPRedis or PHP but openssl itself ? Or maybe the error is wrongly handled somewhere?
I managed to reproduce this error on my machine using php 7.4.27 with the debug enabled and the script always freezes after a few redis queries (I removed the auth part as my redis had no auth setup)
Here is the strace (I killed it with a SIGINT after waiting 1109 seconds)
80 22:51:12.696760 (+ 0.000221) fcntl(3, F_GETFL) = 0x802 (flags O_RDWR|O_NONBLOCK)
80 22:51:12.696882 (+ 0.000118) fcntl(3, F_SETFL, O_RDWR) = 0
80 22:51:12.697078 (+ 0.000207) setsockopt(3, SOL_TCP, TCP_NODELAY, [1], 4) = 0
80 22:51:12.697236 (+ 0.000153) setsockopt(3, SOL_SOCKET, SO_KEEPALIVE, [0], 4) = 0
80 22:51:12.697430 (+ 0.000194) write(1, "connected (0.016987800598145)\n", 30) = 30
80 22:51:12.697672 (+ 0.000255) write(1, "option is set (4.7922134399414E-"..., 35) = 35
80 22:51:12.697914 (+ 0.000225) poll([{fd=3, events=POLLIN|POLLPRI|POLLERR|POLLHUP}], 1, 0) = 1 ([{fd=3, revents=POLLIN}])
80 22:51:12.698060 (+ 0.000143) read(3, "\27\3\3\1\n", 5) = 5
80 22:51:12.698196 (+ 0.000143) read(3, "f\227\340\23\32\203\201AD\322T\200\201\276\277\206\352\242\22B\356\245VHxz\243\273\nt(\252"..., 266) = 266
80 22:51:12.698479 (+ 0.000279) read(3, "\27\3\3\1\n", 5) = 5
80 22:51:12.698630 (+ 0.000149) read(3, "\302\32r\203\223{\337U\32402\36\33\276\f)\317\31FT\346&\301\312\375\21\212\226\271q\255,"..., 266) = 266
80 22:51:12.698795 (+ 0.000163) read(3, 0x55581c073803, 5) = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
80 23:09:41.727790 (+ 1109.029052) --- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
Here is the program output:
Total: 0.21900486946106
start (5.9604644775391E-6)
connected (0.0046069622039795)
option is set (9.0599060058594E-6)
db selected (0.00092291831970215)
Total: 0.22468900680542
start (2.6226043701172E-5)
connected (0.0028791427612305)
option is set (8.1062316894531E-6)
db selected (0.0010368824005127)
Total: 0.22873783111572
start (2.3126602172852E-5)
connected (0.0033941268920898)
option is set (3.6001205444336E-5)
^C
Program received signal SIGINT, Interrupt.
And here is the gdb backtrace:
#0 0x00007fc4b54b8461 in __GI___libc_read (fd=3, buf=0x55fde5080863, nbytes=5) at ../sysdeps/unix/sysv/linux/read.c:26
#1 0x00007fc4b5ab1afe in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#2 0x00007fc4b5aace1a in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#3 0x00007fc4b5aabcc3 in ?? () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#4 0x00007fc4b5aac273 in BIO_read () from /usr/lib/x86_64-linux-gnu/libcrypto.so.1.1
#5 0x00007fc4b5cfb61f in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#6 0x00007fc4b5cff4aa in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#7 0x00007fc4b5cfce10 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#8 0x00007fc4b5d04255 in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#9 0x00007fc4b5d0cf6e in ?? () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#10 0x00007fc4b5d0eeb3 in SSL_peek () from /usr/lib/x86_64-linux-gnu/libssl.so.1.1
#11 0x000055fde1c7f9a8 in ?? ()
#12 0x000055fde2090002 in _php_stream_set_option ()
#13 0x000055fde208eb3d in _php_stream_eof ()
#14 0x00007fc4b256ef52 in redis_check_eof (redis_sock=0x7fc4b225e280, no_throw=0) at /tmp/pear/temp/redis/library.c:329
#15 0x00007fc4b2575be4 in redis_sock_write (redis_sock=0x7fc4b225e280, cmd=0x7fc4b227c500 "*2\r\n$6\r\nSELECT\r\n$1\r\n4\r\n", sz=23) at /tmp/pear/temp/redis/library.c:2779
#16 0x00007fc4b254627b in zim_Redis_select (execute_data=0x7fc4b2214210, return_value=0x7ffd39311f70) at /tmp/pear/temp/redis/redis.c:2066
#17 0x000055fde2183cd9 in ?? ()
#18 0x000055fde21edb79 in execute_ex ()
#19 0x000055fde21f1c5c in zend_execute ()
#20 0x000055fde210f76a in zend_execute_scripts ()
#21 0x000055fde206ea40 in php_execute_script ()
#22 0x000055fde21f486a in ?? ()
#23 0x000055fde21f5a81 in ?? ()
#24 0x00007fc4b53f209b in __libc_start_main (main=0x55fde21f51d5, argc=2, argv=0x7ffd39315a78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffd39315a68) at ../csu/libc-start.c:308
#25 0x000055fde1c0490a in _start ()
It seems that the freeze starts here https://github.com/phpredis/phpredis/blob/0719c1eca0f3ce5650f51dfe878350147d424bbe/library.c#L317 Then goes to https://github.com/php/php-src/blob/3240a7476210fb524ba85f4572294d018eb73f71/main/streams/streams.c#L788 and ends somewhere in the crypto extension
I am not sure how to debug the extension itself, I'm going to do some research on this topic and will also see if I can reproduce it without phpredis but using streams only
Extra notes:
- I captured the TLS traffic with wireshark and there is no activity at all during the freeze period it seems that the client is waiting some data while the server has nothing to send
- The freeze also happens in php 8.0.15
@lbassin is it possible to get a dump/output of the TLS connection handshake for a connection that times out?
I know that TLS 1.3 generally requires 1 round trip, while TLS 1.2 generally requires 2 round trips to establish a connection. If the client is timing out during connection setup while waiting to read more (rather than out right failing due to an invalid server response), it sounds like the client is behaving as if it is in TLS 1.2 mode, while the server is behaving in TLS 1.3 mode (by responding to first contact but not the second contact).
It seems to me more and more like there is no bug in openssl per se (other than perhaps less than crystal clear documentation)...
but rather the bugs are in the language dependent wrappers on top of openssl which are not properly accounting for subtle but key differences between tls1.2 and tls1.3.
it seems that tls1.3 on openssl has some opportunities for wrappers to be signaled that there is data to be read, but surprise because the data is protocol control messages not application data, and so the application data read function will hang if it is called.
https://github.com/openssl/openssl/issues/8419 https://github.com/openssl/openssl/issues/7327
Is anyone here building their own PHP codebase? If so, there are two build flags (# defines) that might shine some light on the situation:
#define STREAM_DEBUG 1 #define ZEND_DEBUG 1
With these flags enabled during build, then more (hopefully useful) log output can be generated.
Really amazing results here. I smell that we might be able to push this through that it is getting fixed on the stream wrapper level.
https://bugs.php.net/bug.php?id=79501 This is the php bug report. Maybe you guys can publish your findings there as well?
I was able to determine the exact line that is blocking in PHP 7.4 (7.4..27) when TLS 1.3 is being used.
The offending line is:
https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/ext/openssl/xp_ssl.c#L2466
int n = SSL_peek(sslsock->ssl_handle, &buf, sizeof(buf));
The PHP method that calls this SSL_peek is itself called at (for reference): https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/main/streams/streams.c#L790
if (!stream->eof && PHP_STREAM_OPTION_RETURN_ERR ==
php_stream_set_option(stream, PHP_STREAM_OPTION_CHECK_LIVENESS,
0, NULL)) {
stream->eof = 1;
}
After reading about so-called SSL non-application data records: https://github.com/openssl/openssl/blob/6e94b5aecd619afd25e3dc25902952b1b3194edf/CHANGES#L237 https://wiki.openssl.org/index.php/TLS1.3#Non-application_data_records https://github.com/openssl/openssl/issues/7327
I believe the blocking is occurring because PHP 7.4 is assuming (in xp_ssl.c @ L2466, the first code snippet above) that if there is a record that can be read, then the record must be a data record and therefore the SSL_peek will not block. This assumption is false in both TLS 1.2 and TLS 1.3, however the impact of this assumption is felt rarely if at all in TLS 1.2 but it arises frequently in TLS 1.3, in fact it arises instantly in TLS 1.3.
In TLS 1.2, this line:
https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/ext/openssl/xp_ssl.c#L2464
} else if (php_pollfd_for(sslsock->s.socket, PHP_POLLREADABLE|POLLPRI, &tv) > 0) {
evaluates to false.
But in TLS 1.3 it evaluates to true, which enables SSL_peek to be called, and block.
I was able to get TLS 1.3 working in PHP 7.4 by wrapping the SSL_peek function call and the logic immediately following that call with a call to SSL_pending. SSL_pending returns the count of data bytes that can be read, and these are application data bytes not the non-application data records, so if SSL_pending > 0 then SSL_peek will not block. Beyond that, SSL_pending itself will not block, so it is safe to call.
To show what I did, I converted these lines from: https://github.com/php/php-src/blob/96f753a2b56b7c4927f1a64253ca60bb481ee2c3/ext/openssl/xp_ssl.c#L2466
int n = SSL_peek(sslsock->ssl_handle, &buf, sizeof(buf));
if (n <= 0) {
int err = SSL_get_error(sslsock->ssl_handle, n);
switch (err) {
case SSL_ERROR_SYSCALL:
alive = php_socket_errno() == EAGAIN;
break;
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
alive = 1;
break;
default:
/* any other problem is a fatal error */
alive = 0;
}
}
to these lines:
if(SSL_pending(sslsock->ssl_handle)){
int n = SSL_peek(sslsock->ssl_handle, &buf, sizeof(buf));
if (n <= 0) {
int err = SSL_get_error(sslsock->ssl_handle, n);
switch (err) {
case SSL_ERROR_SYSCALL:
alive = php_socket_errno() == EAGAIN;
break;
case SSL_ERROR_WANT_READ:
case SSL_ERROR_WANT_WRITE:
alive = 1;
break;
default:
/* any other problem is a fatal error */
alive = 0;
}
}
}
I am not claiming this fixes the problem entirely or that there are no side-effects, as I did not run any automated tests or any other manual tests. But this seemingly improved behavior in our particular use case seems to increase the likelihood of the location of the flawed code.
In summary, I am thinking the issue is related how PHP streams wraps OpenSSL. The issue seems to be the assumption that if records are available to read then SSL_peek will not block. Finally, although this seems to show its head mostly in TLS 1.3, this is quite likely also a bug in PHP streams on TLS 1.2, although very (possibly very very) rare.
For reference on that last sentence, see the first comment by mattcaswell at https://github.com/openssl/openssl/issues/7327
I would like to send some updated info to the folks at PHP, so if anyone has any more thoughts/ideas/debug experiences/fixes, consider sharing in the comments here so we can ship a more complete assessment to them.
https://www.openssl.org/docs/man1.1.1/man3/SSL_pending.html https://www.openssl.org/docs/man1.1.1/man3/SSL_read.html
Sorry @inieves I didn't have any free time yesterday to send you the wireshark dump but anyway it doesn't seem useful anymore That's a really great analysis and it matches the backtrace I provided the other day from when the freeze occurs
I think you can already share this thread with all the data to the php guys it should be more than enough to have at least they thought on the topic On my side I'll see if I can add more things during the day
ok I have updated the bug report at PHP:
https://bugs.php.net/bug.php?id=79501
Can you guys have a look at this repo? https://github.com/lbassin/php-bug-tls1.3-blocking I was trying to write a test case for a potential PR submission into the PHP repo and I think I found how to simply reproduce the issue without any extra thing
There is a server and a client. The server starts to listen for tlsv1.3 connection waits 5 seconds (it seems that there is some non-application data sent in this waiting time) and then send some data On the client side, I am checking if there is any data available using stream_select() and if there is some based on this select function then I try to read them using fread() which is a blocking function
Using tls 1.2 here is the output:

Using tls 1.3 here is the output:

In the tlsv1.3 situation we can see that the select function tells us there is data available but nothing is actually readable or sent and therefor hangs until we actually send data 5 seconds later
@lbassin This looks good to me.
This bug has also noticed the same underlying problem: https://github.com/reactphp/socket/issues/184
The problem is that when a server has sent no data, a client side call to feof or stream_eof will block when TLS 1.3 is used and will not block when TLS 1.2 is used.
I am quite confident my fix/workaround above is not correct.
@yatsukhnenko I was able to find a different workaround that may be relevant to your implementation of phpredis. In the case of TLS 1.3, after my call to stream_socket_client I make a call to stream_set_blocking(..., false) which seems to prevent feof and stream_eof from blocking when no data has been sent by the server.
您好,现在无法亲自回复您的邮件。我将尽快给您回复。黄俊杰祝您工作愉快~
While the linked issues and pull requests from both php-src and reactphp seem to suggest that tls1.3 should be working fine; I too experienced a significant amount of blocked/stalled fpm processes while calling ->auth() or while interacting with the Session in my client's Symfony application (Symfony 3.4, using phpredis as session handler through .ini configuration - handled by symfony through NativeSessionStorage/AbstractSessionHandler). Forcing tls1.2 has significantly reduced (but not completely eliminated) this amount. I'm not sure where to look further for a solution at this point.
Additional information: PHP 8.1.3, OpenSSL 1.1.1l, (php)redis version 5.3.6, redis server 6.0.0 (digital ocean instance)