amazon-linux-2023 icon indicating copy to clipboard operation
amazon-linux-2023 copied to clipboard

[Package Request] - zlib-ng-compat and all related zlib-ng enhancements for core OS, from Fedora 40/CentOS 10/Redhat 10

Open plasticity-cloud opened this issue 1 year ago • 7 comments

zlib-ng-compat

obsoleting core zlib and providing zlib-ng (with CloudFlare and Intel optimizations)?

package is available in EPEL, Fedora 9

https://packages.fedoraproject.org/pkgs/zlib-ng/zlib-ng/epel-9.html

from speeding up loading kernel, to amazon corretto support for gzip and python and Amazon EMR via libhadoop, to accelerating MySQL InnoDB with gzip support

Hi Team We are currently testing and trying to backport core Fedora 40/42, that by default ship with operating system. that use zlib-ng and zlib-ng-compat,

will be sharing the build of rpm for AL2003.

We noticed significant reduction of CPU usage and speed up in processing of gzip content, especially when using directly zstd binary to decompress gzip content to disk, 2 times faster than standard Gzip support in JDK, Gzip support in JDK with zlib-ng-compat is 2 times faster than standard Gzip in JDK.

We believe this will be a game changer for Amazon Linux 2023 users, considering it would become mainstream OS for EKS Hybrid on premises and also for core AWS services, like Lambda, Fargate, Aurora and ECS.

Same applies to S3 clients and SOCI snapshotter for containerd, compiled with CGO,

Amazon Corretto is benefiting from it automatically, as one of the rare official JDK distributions it is sourcing zlib from operating system. tested with Corretto JDK 21 on Fedora 40/42.

Official documents from Fedora and zlib-ng:

https://fedoraproject.org/wiki/Changes/ZlibNGTransition

https://github.com/zlib-ng/zlib-ng/blob/develop/PORTING.md

plasticity-cloud avatar Jan 04 '25 16:01 plasticity-cloud

Tested with official gz files, public datasets

https://dumps.wikimedia.org/commonswiki/latest/

Amazon Corretto 21

Code for test tool, requires at least java 17 to compile through maven

https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlibNextGen

Each test was executed

  1. on fresh VM, to avoid situations with disk caching:

  2. on same VM, with pruning all caches, to make tests reliable:

sync; echo 3 > /proc/sys/vm/drop_caches

standard zlib

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql

real 1m19.458s user 1m0.409s sys 0m17.593s

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml

real 0m20.122s user 0m13.025s sys 0m6.726s

zlib-ng

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql

real 0m59.927s user 0m40.901s sys 0m17.308s

time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml

real 0m14.648s user 0m7.545s sys 0m6.978s

plasticity-cloud avatar Jan 04 '25 17:01 plasticity-cloud

Builder for standalone zlib-ng version that can be bundled for e.g. Lambda, is provided in the repo:

https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlib-ng-al2023-integration/standalone

plasticity-cloud avatar Jan 05 '25 22:01 plasticity-cloud

I have also run some of these experiments, and have found good performance improvements in a number of places.

I can't commit to making the change within Amazon Linux 2023, as we do need to balance risks of such a change within a major version of the Operating System.

I am really interested in your experiences with using zlib-ng in place of zlib on AL2023, as that's great input into our decision making.

stewartsmith avatar Jan 09 '25 19:01 stewartsmith

Hi @stewartsmith, much appreciate for your initial feedback, and definitely I do understand that having core OS library substituted requires extra regression testing.

If you could direct me to public pipelines or regression test suite, that your Team executes for every release, I would really like to execute those.

For zlib-ng tests, I will be able to share feedback by early next week for Lambda Container based deployment and EMR on classic EC2 and ECS AMI.

Regards, Karol

plasticity-cloud avatar Jan 10 '25 23:01 plasticity-cloud

Hi Stewart, apologies for the delays.

Test setup: m6g.xlarge, 80GB GP3, Throughput 125MB/s, Standard IOPS,

  1. SOCI Snapshotter, it seems even with stock AL2023, we are not getting expected results when pulling large images in regular mode, when SOCI index is not available,

we are getting 3 seconds difference in favour

and in terms of pulling using SOCI index, we had to transfer and generate ztoc indexes for e.g. public EMR images,

public.ecr.aws/emr-on-eks/spark/emr-7.5.0:latest

When first pulling image: To boostrap container with SOCI index it takes on average with and without zlib-ng 7 seconds, To boostrap container without SOCI index it takes 60 seconds.

We are suspecting that by default SOCI snapshotter doesn't using CGO bindings (with standard zlib and zlib-ng) and relies on only go bindings, despite having build requirement to use zlib-devel and zlib-static, or equivalent or zlib-ng-compat-devel and zlib-ng-compat-static

Will be doing investigation on that, as this would be really beneficial for ECS/EKS users on Fargate users.

Official performance benchmarks:

https://github.com/awslabs/soci-snapshotter/blob/main/docs/benchmark.md#prerequisites

executed on both setups:

soci-snapshotter-performanceTest_zlib_standard.zip soci-snapshotter-performanceTest_zlib_ng.zip

plasticity-cloud avatar Jan 20 '25 00:01 plasticity-cloud

Test on m6.large, decompressing on same volume using zstd with zlib/zlib-ng-compat bindings,

cpu wise zlib-ng is performing at least 2 times better compared to stock zlib, leaving enough bandwidth for another preprocessing tasks:

cpu_stats_decompress_gp3_m6_large.zip

plasticity-cloud avatar Jan 20 '25 00:01 plasticity-cloud

Code to build zlib-ng rpms is hosted in following repository:

https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlib-ng-al2023-integration/os-bundle

after checkout to build rpms locally:

cd al2023/zlib-ng-testing/zlib-ng-al2023-integration/os-bundle/

./rpm-builder-standalone.sh ./zlib-ng-build-standalone.sh

This will output tar.gz bundle with rpms

./releases/latest

plasticity-cloud avatar Jan 20 '25 00:01 plasticity-cloud