bref icon indicating copy to clipboard operation
bref copied to clipboard

Performance issues on Bref PHP 8.1 and Later

Open yh1224 opened this issue 2 years ago • 29 comments

I am experiencing a significant performance issue with the AWS SDK for PHP when running on Bref PHP 8.1 and subsequent versions. Specifically, I have observed that accessing AWS with the SDK is approximately three times slower as compared to Bref PHP 8.0.

To further analyze this, I conducted the same processes on an EC2 instance and found no noticeable performance difference between them. The version of the AWS SDK for PHP I am using is 3.269.9.

Moreover, I conducted a test measuring the time it took to execute 100 queries on DynamoDB. Here are the results:

Environment Specification Elapsed
EC2 PHP 8.0.28 c5.large 7.5 sec
EC2 PHP 8.1.18 c5.large 7.5 sec
EC2 PHP 8.2.5 c5.large 7.5 sec
Bref PHP 8.0.28 (php-80:44) Memory: 1024 MB 4 sec
Bref PHP 8.1.18 (php-81:43) Memory: 1024 MB 11 sec
Bref PHP 8.2.5 (php-82:33) Memory: 1024 MB 11 sec

yh1224 avatar May 10 '23 04:05 yh1224

Thanks for the report and the details!

The one thing that jumps to mind would be OpenSSL 3, that was applied to PHP 8.1 and 8.2: https://github.com/brefphp/aws-lambda-layers/pull/74

A quick search yield these issues for example:

  • https://github.com/openssl/openssl/issues/17064
  • https://github.com/openssl/openssl/issues/17627

Would you be able to run the same test with Bref 2.0.2 (here are the layer versions: https://runtimes.bref.sh/?region=us-east-1&version=2.0.2)? (this is before OpenSSL 3)

cc @GrahamCampbell if that rings a bell to you?

mnapoli avatar May 10 '23 07:05 mnapoli

@mnapoli

I tried using the Layer compatible with Bref 2.0.2 and there was no degradation in performance as shown below.

Environment Layer Elapsed
Bref 2.0.2 / PHP 8.0.28 arn:aws:lambda:ap-northeast-1:534081306603:layer:php-80:42 4 sec
Bref 2.0.2 / PHP 8.1.17 arn:aws:lambda:ap-northeast-1:534081306603:layer:php-81:41 4 sec
Bref 2.0.2 / PHP 8.2.4 arn:aws:lambda:ap-northeast-1:534081306603:layer:php-82:31 4 sec
Bref 2.0.4 / PHP 8.0.28 arn:aws:lambda:ap-northeast-1:534081306603:layer:php-80:44 4 sec
Bref 2.0.4 / PHP 8.1.18 arn:aws:lambda:ap-northeast-1:534081306603:layer:php-81:43 11 sec
Bref 2.0.4 / PHP 8.2.5 arn:aws:lambda:ap-northeast-1:534081306603:layer:php-82:33 11 sec

I would like to use this older version for now. Thank you very much.

yh1224 avatar May 10 '23 09:05 yh1224

Interesting, I have not noticed any excess latency on my workloads (though my workloads are super spiky, so it's hard to notice), and I have some that send 1000s of requests to dynamo using the symfony curl http client with a parallelism of 64 (a curl parameter). I am reluctant to suggest that we rollback to openssl 1.1 because of the fact that it will be EOL very soon, leaving us without any security patches. I wonder if we should do a build with openssl 3.1, and see if it also have the same performance issues for @yh1224.

GrahamCampbell avatar May 10 '23 09:05 GrahamCampbell

Moreover, I conducted a test measuring the time it took to execute 100 queries on DynamoDB.

Are you re-using the client? Can you share a minimal re-producer please?

GrahamCampbell avatar May 10 '23 09:05 GrahamCampbell

Is the same issue present on arm64 - that is, is this just an issue for x86_64?

GrahamCampbell avatar May 10 '23 09:05 GrahamCampbell

@GrahamCampbell

Are you re-using the client? Can you share a minimal re-producer please?

I create the DynamoDbClient each time.

Is the same issue present on arm64 - that is, is this just an issue for x86_64?

Same with arm64. I feel that it is a little more slower than x86_64.

Environment Layer Elapsed
arm64 / Bref 2.0.2 / PHP 8.0.28 arn:aws:lambda:ap-northeast-1:534081306603:layer:arm-php-80:43 4s
arm64 / Bref 2.0.2 / PHP 8.1.17 arn:aws:lambda:ap-northeast-1:534081306603:layer:arm-php-81:22 4s
arm64 / Bref 2.0.2 / PHP 8.2.4 arn:aws:lambda:ap-northeast-1:534081306603:layer:arm-php-82:19 4s
arm64 / Bref 2.0.4 / PHP 8.0.28 arn:aws:lambda:ap-northeast-1:534081306603:layer:arm-php-80:45 4s
arm64 / Bref 2.0.4 / PHP 8.1.18 arn:aws:lambda:ap-northeast-1:534081306603:layer:arm-php-81:24 13s
arm64 / Bref 2.0.4 / PHP 8.2.5 arn:aws:lambda:ap-northeast-1:534081306603:layer:arm-php-82:21 13s

Here is the CDK project to check. https://github.com/yh1224/aws-cdk-lambda-bref-test

The source is below. https://github.com/yh1224/aws-cdk-lambda-bref-test/tree/main/src/TestFunc

yh1224 avatar May 10 '23 12:05 yh1224

I create the DynamoDbClient each time.

This is probably the issue. You should avoid doing that. I think what's happening here is that the TLS handshake has to happen over and over. That would also explain why I haven't noticed this. Not reusing connections is very expensive.

GrahamCampbell avatar May 10 '23 13:05 GrahamCampbell

https://github.com/openssl/openssl/issues/17064

I think this issue is unrelated. There is only one thread here.

GrahamCampbell avatar May 10 '23 13:05 GrahamCampbell

This is probably the issue. You should avoid doing that. I think what's happening here is that the TLS handshake has to happen over and over. That would also explain why I haven't noticed this. Not reusing connections is very expensive.

I would argue that:

  • most PHP apps do this, so we have to consider it (it's not an edge case)
  • this lets us identify the underlying issue

Fact is that OpenSSL 3 has an issue. Reusing connections is just a workaround, and it's not good for most PHP apps.

mnapoli avatar May 10 '23 14:05 mnapoli

I strongly disagree that it is a workaround. Setting up a new TLS connection is an expensive thing to do, and application developers should always reuse connections, in the same way that it is standard practice to reuse database connections.

GrahamCampbell avatar May 10 '23 15:05 GrahamCampbell

Most PHP apps that I have seen do re-use connections. Anyone using the Laravel Guzzle wrapper will have connections re-used. Similarly, anyone using any Laravel native AWS implementations like SES or SQS will also have connections re-used.

GrahamCampbell avatar May 10 '23 15:05 GrahamCampbell

Not re-using connections and not cleaning up can lead to serious issues like exceeding the max open sockets limit on Lambda (I have done this before :trollface:).

GrahamCampbell avatar May 10 '23 15:05 GrahamCampbell

@GrahamCampbell

I understand that this is not an issue with PHP/Bref, but is due to OpenSSL 3.

I'm aware that reusing the client is inefficient, but due to the legacy code of our current production being migrated to Lambda, it seems we can't address it immediately. As the difference in execution time directly affects the cost of using Lambda, I still want to hold out with the old version at this time.

If the performance degradation with OpenSSL is due to some "problem", I would like to wait for its resolution.

Thank you very much.

yh1224 avatar May 10 '23 23:05 yh1224

@GrahamCampbell maybe we are not talking about the same thing. Maybe you are talking about reusing the client in the same request? I am talking about reusing the client between Lambda invocations.

Most of the HTTP requests served with Bref are with FPM. That means each request has to do the SSL connection again (because the client cannot be kept alive). This is why I'm saying OpenSSL 3 introduces a performance regression that impacts a majority of Bref users.

mnapoli avatar May 11 '23 09:05 mnapoli

Maybe you are talking about reusing the client in the same request?

Yes.

I am talking about reusing the client between Lambda invocations.

I see. I also do that, because I don't use php-fpm, but I appreciate many people don't.

This is why I'm saying OpenSSL 3 introduces a performance regression that impacts a majority of Bref users.

That is not so clear. We'd have to do some more investigation, maybe profiling with blackfire.

GrahamCampbell avatar May 11 '23 12:05 GrahamCampbell

In case it's useful, here's a Linux (Ubuntu) serverful example of this openssl 1.1 vs 3 issue when it comes to performance of PHP scripts making web requests: https://github.com/php/php-src/issues/10753#issuecomment-1852335196

M1ke avatar Dec 12 '23 16:12 M1ke

@M1ke thanks for sharing. We have hopes that OpenSSL 3.2 might fix things, but that needs to be tested.

mnapoli avatar Dec 13 '23 09:12 mnapoli

Hey @mnapoli I've just found out that the performance issues i told you in the other issue is because we are making https calls.

When i can make the https in plain http the performance is the same(or better) as before. Unfortunately there are some calls that we can't perform in http.

How can we test the new OpenSSL3.2? Maybe it will fix those issues. I was seeing on average 400ms with https calls and with plain http it is now 120ms on average.

With the fix i have hopes to make it under 100ms as it was before.

Thanks

atrope avatar Jan 29 '24 16:01 atrope

We are waiting for the next release of postgres 15.x. The current release does not support OpenSSL 3.2.

GrahamCampbell avatar Jan 29 '24 17:01 GrahamCampbell

@GrahamCampbell is it possible to generate a layer without postgres if we don't use it? Maybe we could have an specific version without postgres because 4x the pricing for using https is a lot.

atrope avatar Jan 29 '24 18:01 atrope

Or do you know which version of php-82-fpm has openssl 3.2? That way i can test in a stressfull env anc check if this is the actual problerm

atrope avatar Jan 29 '24 18:01 atrope

You can compile a layer yourself. Otherwise, it'll probably be around a month from now when OpenSSL 3.2.1 gets put into Bref, in combination with Postgres 15.6. It'll likely be PHP 8.1, 8.2 and 8.3. PHP 8.0 will remain on OpenSSL 1.1.1.

GrahamCampbell avatar Jan 29 '24 18:01 GrahamCampbell

Okay, i've now tested in 1M requests the version 31 of the layer which has OpenSSL1.1.1 - arn:aws:lambda:us-east-1:534081306603:layer:php-82-fpm:31

Only by changing the newest one (58) to the old one I see the performance increase of 50%. That said I really think we should not keep the OpenSSL3 in the production version of the Layer.

As we all know a function running on average in 140ms is 100% more expensive than one running at 70ms.

Screenshot 2024-01-29 at 16 54 56

atrope avatar Jan 29 '24 19:01 atrope

The problem with OpenSSL 1.1.1 is that it has unpatched security vulnerabilities.

GrahamCampbell avatar Jan 29 '24 20:01 GrahamCampbell

Hey @atrope! I've seen your other message, thanks for bumping this issue. Yeah, that is tempting to go back to OpenSSL 1.1.1 😞

it'll probably be around a month from now when OpenSSL 3.2.1 gets put into Bref, in combination with Postgres 15.6

Sounds like the best course of action right now unfortunately

mnapoli avatar Jan 30 '24 19:01 mnapoli

And that time has arrived. 🎉

  1. https://github.com/brefphp/aws-lambda-layers/pull/161
  2. https://github.com/brefphp/aws-lambda-layers/pull/162

Probably we can wait till next week to release this stuff (assuming those PRs are accepted) so that we can also include cURL 8.6.1 and the next round of PHP patch releases, both are due to land next week.

GrahamCampbell avatar Feb 08 '24 11:02 GrahamCampbell

It's out in Bref 2.1.15 https://github.com/brefphp/bref/releases/tag/2.1.15 🎉

I'd love to hear if things are working better with this new version (which uses OpenSSL 3.2).

mnapoli avatar Feb 16 '24 14:02 mnapoli

Hi!

Just tested it out the latest version.

We had an average of 75ms with layer 31 and with layer 61(php-82-fpm) it increased to ~85ms. Tested in the last 15minutes period.

Will leave it for the next days and check how it will perform.

atrope avatar Feb 27 '24 00:02 atrope

What about the performance of layer 60 vs 61? Also, how does P99.9 compare - for me this was the big improvement, assuming you are reusing connections between invokes.

GrahamCampbell avatar Feb 27 '24 08:02 GrahamCampbell