hail
hail copied to clipboard
[batch] expose job cloud location to input, main, and output containers
What happened?
Batch should expose a job's cloud location to the job. In particular, now that multi-regional buckets charge egress, users needing large numbers of cores will need to manually duplicate their data in multiple regions and then choose the correct data source based on the region in which the job is scheduled.
The implementor should consider other options but here is an initial proposal:
- Input and output files become dictionaries mapping from location to input/output. (If location is not found in list, job fails).
- Main container's file system and environment are populated with information about the location.
Implementor should consider whether region, zone, or both should be exposed in GCP. Likewise for Azure regions and AZs.
References
- https://hail.zulipchat.com/#narrow/stream/127527-team/topic/batch.20cluster/near/417261935
Version
0.2.127
Relevant log output
No response