google-cloud-compute-machine-types icon indicating copy to clipboard operation
google-cloud-compute-machine-types copied to clipboard

Add Support for Google Cloud A4 and A4X Machine Types

Open hungnphan opened this issue 4 months ago • 2 comments

Summary

This PR adds support for the newly released Google Cloud Compute Engine machine types A4 and A4X, along with their associated NVIDIA GPU models (B200, GB200, and H200).

Motivation

Google Cloud has recently announced the A4 and A4X machine series featuring the latest NVIDIA Blackwell GPU architecture. These new accelerator-optimized machine types are designed for foundation model training and serving, representing a significant advancement in AI/ML compute capabilities.

Reference: https://cloud.google.com/compute/docs/gpus/

Changes Made

New Machine Type Configurations

1. A4 Machine Series (instances/series/a4.sql)

  • Family: Accelerator-optimized
  • GPU: NVIDIA B200 Blackwell GPUs
  • CPU Platform: Sapphire Rapids
  • Local SSD: 12,000 GiB
  • Network Bandwidth: 3,600 Gbps
  • Spot VM Support: Enabled
  • Machine Type: a4-highgpu-8g
    • 224 vCPUs
    • 3,968 GB memory
    • 8x NVIDIA B200 GPUs (1,440 GB total GPU memory)

2. A4X Machine Series (instances/series/a4x.sql)

  • Family: Accelerator-optimized
  • GPU: NVIDIA GB200 Grace Blackwell Superchips
  • CPU Platform: ARM Neoverse V2
  • Local SSD: 12,000 GiB
  • Network Bandwidth: 2,000 Gbps
  • ARM Architecture: Supported
  • Spot VM Support: Enabled
  • Machine Type: a4x-highgpu-4g
    • 140 vCPUs
    • 884 GB memory
    • 4x NVIDIA GB200 GPUs (720 GB total GPU memory)

GPU Model Support

Added support for the following NVIDIA GPU models in instances/series/gpu/gpu_names.sql:

  • NVIDIA H200 141GB (nvidia-h200-141gb) - Used in A3 Ultra
  • NVIDIA B200 (nvidia-b200) - Used in A4
  • NVIDIA GB200 (nvidia-gb200) - Used in A4X

Documentation Updates

Updated instances/README.md to:

  • Add A4 and A4X to the machine types list
  • Fix A3 link (was incorrectly pointing to a2.sql)
  • Update resources section to reference A3, A4, and A4X accelerator-optimized machines

Testing

All SQL files follow the existing project patterns and schema:

  • Consistent formatting with existing machine type configurations
  • Proper series and family classification
  • Accurate specifications from official Google Cloud documentation

References

Checklist

  • [x] Created new SQL configuration files for A4 and A4X machine types
  • [x] Updated GPU names mapping for new NVIDIA models
  • [x] Updated documentation to reflect new machine types
  • [x] Followed existing code style and patterns
  • [x] All changes are based on official Google Cloud documentation
  • [x] Clear and descriptive commit messages

Additional Notes

These machine types represent Google Cloud's latest offerings for AI/ML workloads:

  • A4 is optimized for foundation model training and serving with NVIDIA B200 GPUs
  • A4X features GB200 Grace Blackwell Superchips combining ARM CPUs with B200 GPUs for exascale AI computing

Both machine types require capacity reservation or specific provisioning methods as outlined in the Google Cloud documentation.

hungnphan avatar Oct 19 '25 17:10 hungnphan

Thanks for the pull. As I understand it, you can only get A4 and A3 if you are special activated and have a separate contract. Am I right? How can we calculate the list price?

Please see: https://github.com/Cyclenerd/google-cloud-pricing-cost-calculator/issues/279 and https://github.com/Cyclenerd/google-cloud-pricing-cost-calculator/issues/309

Cyclenerd avatar Oct 22 '25 13:10 Cyclenerd

Note: The machine type a4x-highgpu-4g is not published via the Google Compute API atm.

Cyclenerd avatar Oct 22 '25 13:10 Cyclenerd