hail icon indicating copy to clipboard operation
hail copied to clipboard

[batch] Complete the transition from GSA key files to the Batch metadata server

Open daniel-goldstein opened this issue 2 months ago • 0 comments

What happened?

See Batch Metadata Server RFC for background. The objective of this issue is to fully remove GSA key files from Batch job filesystems, preventing possible exfiltration of long-lived credentials.

Each remaining task should get its own issue if there isn't already one. Breakdown of tasks:

  • [X] Implement a Batch metadata server and expose it in GCP DockerJobs (#14019)
  • [ ] Add metadata server support for JVMJobs aka Query-on-Batch in GCP (#14487)
  • [ ] Add metadata server support in Azure
  • [ ] Deprecate and remove support for key files in DockerJobs
  • [ ] Deprecate and remove support for key files in JVMJobs. This requires dropping support for old versions of hail that depend on the key file (up to and including at least 0.2.130)

These steps get us past the security milestone of not exposing GSA key files to jobs and risking exfiltration. We might be able to go even further and get rid of key files entirely, which would reduce our operational burden of securing and rotating them.

  • [ ] In GCP, use Service Account Impersonation to have the Batch Worker identity impersonate user GSAs, allowing it to create metadata server access tokens without the key files themselves
  • [ ] In Azure, investigate if something like the above is even possible. At time of writing, it does not appear that there is an alternative other than storing credentials or adding users to the VM's metadata server. It is unclear whether this can be done dynamically and with what frequency and feels like not their intended use case.

Version

0.2.130

Relevant log output

No response

daniel-goldstein avatar Apr 18 '24 20:04 daniel-goldstein