dsub icon indicating copy to clipboard operation
dsub copied to clipboard

Multi-region support google-batch

Open rivershah opened this issue 2 years ago • 8 comments

google-cls-v2 has powerful multi region support through wild card matching or providing lists of regions. It appears that google-batch is lacking this feature.

Multi-region support is a much loved and used feature with google-cls-v2. Can you please verify that indeed google-batch does not have this. And if not, would it be possible to work with google batch developers to introduce this feature by the time Cloud Life Sciences gets removed. Thank you.

rivershah avatar Dec 04 '23 06:12 rivershah

Thanks for the input @rivershah.

A cursory look indicates that Batch supports this via the LocationPolicy:

https://cloud.google.com/batch/docs/reference/rest/v1alpha/projects.locations.jobs#locationpolicy

We'll test it out and look to wire it up if it works as expected.

mbookman avatar Dec 04 '23 17:12 mbookman

@mbookman Thanks for looking. My understanding of the docs is that batch api will raise an error if multiple regions.

Only one region or multiple zones in one region is supported now

It appears the multi-region support may not have been ported over in batch. Await the results of your experimentation with this as I can't seem to get my jobs to schedule if I specify multiple regions as VM enter error state.

For context, why the multi-region support is so useful is that greatly simplifies job submission for hard to find resources such as high memory nodes and GPU accelerators. It is typical that a large parallel GPU dsub tsv submission will find resources across geographically widely separated regions

rivershah avatar Dec 04 '23 21:12 rivershah

Does dsub work with google batch?

anngregory avatar Dec 19 '23 14:12 anngregory

@mbookman Happy new year! I am still pretty sure that batch as implemented on google's side, does not support submitting a job to us wide regions, which google-cls-v2 does allow. This would be a major feature regression. I don't think this a dsub limitation.

Can you please verify if I what I am saying is correct. If so, we will need help determine if this feature can be implemented in batch

rivershah avatar Jan 06 '24 22:01 rivershah

@mbookman @wnojopra As the google-cls-v2 is headed for removal soon enough, requesting that we look at this feature regression. Thank you

rivershah avatar Apr 05 '24 11:04 rivershah

Hey @rivershah !

Sorry about the delay in following up. We did check in with the Batch team regarding this. The lack of the multi-region support is presently intentional in the sense that it was not considered to have high utility. It would be great if we could get more input from you on your use case and where you see it giving value.

One of the key drivers of this feature not being added to Batch is the change in Cloud pricing in 2022 where accessing data from multi-region buckets to regional buckets became something that incurs Data Transfer Out charges (fka egress charges).

https://cloud.google.com/storage/pricing-announce#network

Reading data in a Cloud Storage bucket located in a multi-region from a Google Cloud service located in a region on the same continent will no longer be free; instead, such moves will be priced the same as general data moves between different locations on the same continent.

                        Northern America
Northern America	$0.02/GB

Prior to those pricing changes, access to data in US multi-region bucket to any of the US regions was free. So the Cloud view on this is that generally people will want to use regional buckets and regional VMs.

So can you share your use case where this pricing change has not impacted you and where you'd get high value from multiple regions?

Thanks.

mbookman avatar Apr 08 '24 16:04 mbookman

Hi @mbookman,

Apologies for the delay. The multi-region feature is crucial for several reasons:

  • Hardware Flexibility: Users can't predict accelerator hardware and preemptible machine availability in advance. Multi-region support allows Google Batch layers to optimize and find suitable machines.
  • Artifact Registry: Multi-region artifact registries have multi region optimized pricing, making region flexibility beneficial.
  • Resource Availability: For machine learning, having access to GPUs across multiple regions is more valuable than saving on egress charges. This flexibility helps ensure that we can scale and schedule resources efficiently.

rivershah avatar Jul 30 '24 10:07 rivershah