custom-images icon indicating copy to clipboard operation
custom-images copied to clipboard

gsutil error while creating custom Image

Open John-Neff opened this issue 1 year ago • 5 comments

We are seeing an issue with gsutil when trying to create a custom image.

09:36:40 Traceback (most recent call last): 09:36:40 File "/usr/bin/google-cloud-sdk/platform/gsutil/gsutil", line 21, in 09:36:40 gsutil.RunMain() 09:36:40 File "/usr/bin/google-cloud-sdk/platform/gsutil/gsutil.py", line 157, in RunMain 09:36:40 import gslib.main 09:36:40 File "/usr/bin/google-cloud-sdk/platform/gsutil/gslib/main.py", line 95, in 09:36:40 from gslib.command_runner import CommandRunner 09:36:40 File "/usr/bin/google-cloud-sdk/platform/gsutil/gslib/command_runner.py", line 37, in 09:36:40 from gslib.cloud_api_delegator import CloudApiDelegator

This seems to be linked to gsutil's incompatibility with newer versions of Python. Once we rolled back to Python 3.8 the issue went away. We solved this issue in our own scripts by updating to gcloud storage. There are 4 script within this repo that use gsutil. Can those be update to use gcloud storage?

John-Neff avatar Feb 18 '25 15:02 John-Neff

Yes, I will get to it eventually, but if you supply your tested patch, my efforts can be better spent by reviewing your change and merging it in.

cjac avatar Feb 18 '25 16:02 cjac

https://github.com/search?q=repo%3AGoogleCloudDataproc%2Fcustom-images%20gsutil&type=code

There's not really a "patch" to review. Above are the 4 scripts where gsutil still exists. Per Google's own documentation (https://cloud.google.com/storage/docs/gsutil#should-you-use), gcloud storage should be used instead of gsutil.

Having gsutil within the Dataproc Custom Image limits the version of python that is able to install on the build nodes and can potentially break the build process.

John-Neff avatar Feb 18 '25 16:02 John-Neff

Well, yes, and... do we want to support the situation where customers are running[1] in a container that doesn't have access to gcloud storage still?

Would it make sense to pick either gcloud or gcloud storage based on the version of the gcloud sdk installed on the system from which the action is being run? I've recently made a change[2] to the gpu driver installer to prefer gcloud storage, but fall back to gsutil on legacy SDK versions.

Technically speaking, my peers in Engineering have committed to supporting images as far back as 2.0.27-debian10 at this time.  Shouldn't we allow the init actions script to be run without modification from the -m node of any single-node instance we support?  If the Dockerfile[3] isn't the platform that we target, shouldn't the -m node of a single node cluster be the platform to target by default?

I'd really like to hear from users before I make an arbitrary decision that breaks your use case. Please come now.

[1] https://github.com/GoogleCloudDataproc/custom-images/tree/main/examples/secure-boot#examples

[2] https://github.com/GoogleCloudDataproc/initialization-actions/pull/1302/commits/2b9ac477178f060097fe56c5b9c11da2479d12f4#diff-906af1da0fd3633de5287bf7cd7dcd0cac7f06dce2340798d53bbab9d037aa7eR2232

[3] https://github.com/GoogleCloudDataproc/custom-images/blob/main/Dockerfile

cjac avatar Feb 18 '25 18:02 cjac

Okay, please review #112 ; once you have confirmed that this patch doesn't break your use case, I'll merge.

cjac avatar Feb 18 '25 20:02 cjac

Hello John, I'm finishing up the review now. Can you confirm that the code from #112 , now updated to fix issues Chris found, continues to work in your environment?

cjac avatar Mar 29 '25 19:03 cjac