sagemaker-studio-image-build-cli icon indicating copy to clipboard operation
sagemaker-studio-image-build-cli copied to clipboard

UnicodeEncodeError in zip.write()

Open athewsey opened this issue 3 years ago • 0 comments

I'm attempting to get this sample (which builds a container image from notebook in the "Define a SageMaker Model Monitor schedule" section) running in SageMaker Studio, using the new CLI.

Essentially there is a ./docker/ folder next to my notebook containing just a Dockerfile and evaluation.py script.

However when I run:

!sm-docker build ./docker --file ./docker/Dockerfile --repository sagemaker-processing-container:latest

(Or same without specifying the --file or --repository options, or omitting the :latest tag) I get the following error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.6/zipfile.py", line 432, in _encodeFilenameFlags
    return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-31: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/conda/bin/sm-docker", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/cli.py", line 92, in main
    args.func(args, unknown)
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/cli.py", line 53, in build_image
    args.repository, get_role(args), args.bucket, extra_args, log=not args.no_logs
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/builder.py", line 68, in build_image
    bucket, key = upload_zip_file(repository, bucket, " ".join(extra_args))
  File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/builder.py", line 39, in upload_zip_file
    zip.write(f"{dirname}/{file}")
  File "/opt/conda/lib/python3.6/zipfile.py", line 1622, in write
    with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
  File "/opt/conda/lib/python3.6/zipfile.py", line 1355, in open
    return self._open_to_write(zinfo, force_zip64=force_zip64)
  File "/opt/conda/lib/python3.6/zipfile.py", line 1468, in _open_to_write
    self.fp.write(zinfo.FileHeader(zip64))
  File "/opt/conda/lib/python3.6/zipfile.py", line 422, in FileHeader
    filename, flag_bits = self._encodeFilenameFlags()
  File "/opt/conda/lib/python3.6/zipfile.py", line 434, in _encodeFilenameFlags
    return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 11-31: surrogates not allowed

It's a weird error so I could well be doing something stupid - but am wondering if there's an implicitly assumed encoding somewhere which is clashing with this kernel's environment?

I don't have any special chars in filenames, and am running Studio kernel Python 3 (PyTorch CPU Optimized).

Any ideas or insights greatly appreciated!

Full steps to reproduce

(From the referenced public sample above)

  • Add this package to the set of pip installs at the top
  • Replace the ! unzip ... command with something like the following (since Studio kernels don't have unzip installed by default)
import zipfile
with zipfile.ZipFile('GTSRB_Final_Test_Images.zip', 'r') as zip_ref:
    print('Unzipping...')
    zip_ref.extractall()
  • Split the cell containing # Create ECR repository and push docker image: Execute just the first (Python) half and run the above sm-docker command instead of the sample's !docker build ... line.

athewsey avatar Sep 16 '20 12:09 athewsey