hail icon indicating copy to clipboard operation
hail copied to clipboard

[batch] expose job cloud location to input, main, and output containers

Open danking opened this issue 5 months ago • 1 comments

What happened?

Batch should expose a job's cloud location to the job. In particular, now that multi-regional buckets charge egress, users needing large numbers of cores will need to manually duplicate their data in multiple regions and then choose the correct data source based on the region in which the job is scheduled.

The implementor should consider other options but here is an initial proposal:

  1. Input and output files become dictionaries mapping from location to input/output. (If location is not found in list, job fails).
  2. Main container's file system and environment are populated with information about the location.

Implementor should consider whether region, zone, or both should be exposed in GCP. Likewise for Azure regions and AZs.

References

  • https://hail.zulipchat.com/#narrow/stream/127527-team/topic/batch.20cluster/near/417261935

Version

0.2.127

Relevant log output

No response

danking avatar Jan 22 '24 17:01 danking