custom-images icon indicating copy to clipboard operation
custom-images copied to clipboard

feat: Add no-customization.sh for disk usage metrics

Open cjac opened this issue 8 months ago • 0 comments

This commit introduces examples/secure-boot/no-customization.sh, a new script designed primarily for collecting disk usage metrics during custom image builds. This script directly addresses the need for detailed disk analysis, a critical component in ensuring Dataproc images remain lightweight and optimized, as emphasized in the recent Dataproc 2.3 release which focused on reduced CVEs and smaller image footprints

During the development and review of the Dataproc 2.3 custom images, this no-customization.sh script, along with similar disk metric collection logic in install_gpu_driver.sh, was instrumental. These tools allowed for precise measurement of disk consumption at various stages of image creation. The data gathered directly informed decisions regarding package inclusions and default disk sizes, contributing significantly to the ~70% reduction in open-source software components and ~50% reduction in total CVEs observed in Dataproc 2.3 images

The metrics captured by this script (and the install_gpu_driver.sh exit handler) were fed back into pre-init.sh and other image generation orchestrators, allowing for iterative refinement and validation of the image size. This continuous feedback loop was crucial for achieving the lightweight and compliant image goals for Dataproc 2.3. The inclusion of this script in the repository makes this valuable diagnostic tool available for future image optimization efforts, especially relevant for new AI/ML images or subsequent releases.

The no-customization.sh script leverages existing patterns for disk usage monitoring and cleanup within the custom images repository, including the use of df, perl for metric calculation, and the dd command for zeroing free space when creating-image metadata is present.

cjac avatar Jun 20 '25 22:06 cjac