[Package Request] - zlib-ng-compat and all related zlib-ng enhancements for core OS, from Fedora 40/CentOS 10/Redhat 10
zlib-ng-compat
obsoleting core zlib and providing zlib-ng (with CloudFlare and Intel optimizations)?
package is available in EPEL, Fedora 9
https://packages.fedoraproject.org/pkgs/zlib-ng/zlib-ng/epel-9.html
from speeding up loading kernel, to amazon corretto support for gzip and python and Amazon EMR via libhadoop, to accelerating MySQL InnoDB with gzip support
Hi Team We are currently testing and trying to backport core Fedora 40/42, that by default ship with operating system. that use zlib-ng and zlib-ng-compat,
will be sharing the build of rpm for AL2003.
We noticed significant reduction of CPU usage and speed up in processing of gzip content, especially when using directly zstd binary to decompress gzip content to disk, 2 times faster than standard Gzip support in JDK, Gzip support in JDK with zlib-ng-compat is 2 times faster than standard Gzip in JDK.
We believe this will be a game changer for Amazon Linux 2023 users, considering it would become mainstream OS for EKS Hybrid on premises and also for core AWS services, like Lambda, Fargate, Aurora and ECS.
Same applies to S3 clients and SOCI snapshotter for containerd, compiled with CGO,
Amazon Corretto is benefiting from it automatically, as one of the rare official JDK distributions it is sourcing zlib from operating system. tested with Corretto JDK 21 on Fedora 40/42.
Official documents from Fedora and zlib-ng:
https://fedoraproject.org/wiki/Changes/ZlibNGTransition
https://github.com/zlib-ng/zlib-ng/blob/develop/PORTING.md
Tested with official gz files, public datasets
https://dumps.wikimedia.org/commonswiki/latest/
Amazon Corretto 21
Code for test tool, requires at least java 17 to compile through maven
https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlibNextGen
Each test was executed
-
on fresh VM, to avoid situations with disk caching:
-
on same VM, with pruning all caches, to make tests reliable:
sync; echo 3 > /proc/sys/vm/drop_caches
standard zlib
time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql
real 1m19.458s user 1m0.409s sys 0m17.593s
time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml
real 0m20.122s user 0m13.025s sys 0m6.726s
zlib-ng
time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-page.sql
real 0m59.927s user 0m40.901s sys 0m17.308s
time java -jar target/zlibNextGen-1.0.jar $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml.gz $HOME/datasets/data_commons_wiki/commonswiki-latest-pages-logging2.xml
real 0m14.648s user 0m7.545s sys 0m6.978s
Builder for standalone zlib-ng version that can be bundled for e.g. Lambda, is provided in the repo:
https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlib-ng-al2023-integration/standalone
I have also run some of these experiments, and have found good performance improvements in a number of places.
I can't commit to making the change within Amazon Linux 2023, as we do need to balance risks of such a change within a major version of the Operating System.
I am really interested in your experiences with using zlib-ng in place of zlib on AL2023, as that's great input into our decision making.
Hi @stewartsmith, much appreciate for your initial feedback, and definitely I do understand that having core OS library substituted requires extra regression testing.
If you could direct me to public pipelines or regression test suite, that your Team executes for every release, I would really like to execute those.
For zlib-ng tests, I will be able to share feedback by early next week for Lambda Container based deployment and EMR on classic EC2 and ECS AMI.
Regards, Karol
Hi Stewart, apologies for the delays.
Test setup: m6g.xlarge, 80GB GP3, Throughput 125MB/s, Standard IOPS,
- SOCI Snapshotter, it seems even with stock AL2023, we are not getting expected results when pulling large images in regular mode, when SOCI index is not available,
we are getting 3 seconds difference in favour
and in terms of pulling using SOCI index, we had to transfer and generate ztoc indexes for e.g. public EMR images,
public.ecr.aws/emr-on-eks/spark/emr-7.5.0:latest
When first pulling image: To boostrap container with SOCI index it takes on average with and without zlib-ng 7 seconds, To boostrap container without SOCI index it takes 60 seconds.
We are suspecting that by default SOCI snapshotter doesn't using CGO bindings (with standard zlib and zlib-ng) and relies on only go bindings, despite having build requirement to use zlib-devel and zlib-static, or equivalent or zlib-ng-compat-devel and zlib-ng-compat-static
Will be doing investigation on that, as this would be really beneficial for ECS/EKS users on Fargate users.
Official performance benchmarks:
https://github.com/awslabs/soci-snapshotter/blob/main/docs/benchmark.md#prerequisites
executed on both setups:
soci-snapshotter-performanceTest_zlib_standard.zip soci-snapshotter-performanceTest_zlib_ng.zip
Test on m6.large, decompressing on same volume using zstd with zlib/zlib-ng-compat bindings,
cpu wise zlib-ng is performing at least 2 times better compared to stock zlib, leaving enough bandwidth for another preprocessing tasks:
Code to build zlib-ng rpms is hosted in following repository:
https://github.com/plasticity-cloud/aws-next-gen/tree/main/al2023/zlib-ng-testing/zlib-ng-al2023-integration/os-bundle
after checkout to build rpms locally:
cd al2023/zlib-ng-testing/zlib-ng-al2023-integration/os-bundle/
./rpm-builder-standalone.sh ./zlib-ng-build-standalone.sh
This will output tar.gz bundle with rpms
./releases/latest