sagemaker-studio-image-build-cli
sagemaker-studio-image-build-cli copied to clipboard
UnicodeEncodeError in zip.write()
I'm attempting to get this sample (which builds a container image from notebook in the "Define a SageMaker Model Monitor schedule" section) running in SageMaker Studio, using the new CLI.
Essentially there is a ./docker/
folder next to my notebook containing just a Dockerfile
and evaluation.py
script.
However when I run:
!sm-docker build ./docker --file ./docker/Dockerfile --repository sagemaker-processing-container:latest
(Or same without specifying the --file
or --repository
options, or omitting the :latest
tag) I get the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/zipfile.py", line 432, in _encodeFilenameFlags
return self.filename.encode('ascii'), self.flag_bits
UnicodeEncodeError: 'ascii' codec can't encode characters in position 11-31: ordinal not in range(128)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/conda/bin/sm-docker", line 8, in <module>
sys.exit(main())
File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/cli.py", line 92, in main
args.func(args, unknown)
File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/cli.py", line 53, in build_image
args.repository, get_role(args), args.bucket, extra_args, log=not args.no_logs
File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/builder.py", line 68, in build_image
bucket, key = upload_zip_file(repository, bucket, " ".join(extra_args))
File "/opt/conda/lib/python3.6/site-packages/sagemaker_studio_image_build/builder.py", line 39, in upload_zip_file
zip.write(f"{dirname}/{file}")
File "/opt/conda/lib/python3.6/zipfile.py", line 1622, in write
with open(filename, "rb") as src, self.open(zinfo, 'w') as dest:
File "/opt/conda/lib/python3.6/zipfile.py", line 1355, in open
return self._open_to_write(zinfo, force_zip64=force_zip64)
File "/opt/conda/lib/python3.6/zipfile.py", line 1468, in _open_to_write
self.fp.write(zinfo.FileHeader(zip64))
File "/opt/conda/lib/python3.6/zipfile.py", line 422, in FileHeader
filename, flag_bits = self._encodeFilenameFlags()
File "/opt/conda/lib/python3.6/zipfile.py", line 434, in _encodeFilenameFlags
return self.filename.encode('utf-8'), self.flag_bits | 0x800
UnicodeEncodeError: 'utf-8' codec can't encode characters in position 11-31: surrogates not allowed
It's a weird error so I could well be doing something stupid - but am wondering if there's an implicitly assumed encoding somewhere which is clashing with this kernel's environment?
I don't have any special chars in filenames, and am running Studio kernel Python 3 (PyTorch CPU Optimized).
Any ideas or insights greatly appreciated!
Full steps to reproduce
(From the referenced public sample above)
- Add this package to the set of
pip install
s at the top - Replace the
! unzip ...
command with something like the following (since Studio kernels don't have unzip installed by default)
import zipfile
with zipfile.ZipFile('GTSRB_Final_Test_Images.zip', 'r') as zip_ref:
print('Unzipping...')
zip_ref.extractall()
- Split the cell containing
# Create ECR repository and push docker image
: Execute just the first (Python) half and run the abovesm-docker
command instead of the sample's!docker build ...
line.