[Bug] - Performance much lower than AL2
Describe the bug
We are running a set of Apache + PHP webservers on an ECS Cluster backed by an auto scaling group with t3.micros. For several months we have been running one of the ECS optimised AMIs provided by AWS, specifically amzn2-ami-ecs-kernel-5.10-hvm-2.0.20250515-x86_64-ebs. Last week we upgraded the AMI to al2023-ami-ecs-hvm-2023.0.20251108-kernel-6.1-x86_64 and immediately saw a measurable decrease in API performance as measured by CPU usage and response time.
To Reproduce Run an apache server on the two AMIs and measure response time and CPU usage.
Expected behavior Performance metrics should not be decreasing
Screenshots
Additional context Please let me know if any additional information is required. Thanks in advance
Are the requests over https? Does the PHP application use any specific modules/libraries or connect to remote resources? Do you have any resource constraints set on your ECS containers? Is the container also AL2023?
Requests are over https, but terminated at the application load balancer, then just http through to the tasks. PHP is just using the default PHP PDO library to connect to our DB (MySQL Aurora RDS), and other than that the AWS PHP SDK to access to other AWS services eg S3, SNS and Lambda depending on the endpoint. Resource constraints, we are running on t3.micro instances, one task per instance, I believe 800MiB RAM and the full 2vCPU. The container is running Ubuntu 24.04 LTS. Thanks
Just to make sure we understand the context correctly ... All of the application level code is running inside the Ubuntu container which is identical between the good and bad case, the main difference being the "host" operating system, right ? That would tend to point the light towards the 6.1 kernel (and associated container-related devices ... is it using veth to bridge networking into the containers ? Sorry, I'm not super familiar with how ECS works).
Do you have a way to setup some kind of repro setup (with a synthetic/test workload), ie, non-production, to experiment with things a bit ?
It would be useful to see if the instance memory impacts the performances, also kernel 6.1 vs 6.12. I've asked our kernel engineers to also have a look see if they can suggest something, but if you could provide us with some kind of minimal reprocase (maybe dummy web server in the container) that we could use to diagnose, that would definitely speed things up.