botocore icon indicating copy to clipboard operation
botocore copied to clipboard

Unable to parse S3 CreateMultipartUpload response with invalid XML characters

Open floppym opened this issue 6 months ago • 3 comments

Describe the bug

botocore fails to process an S3 CreateMultipartUpload request when the response contains characters which are invalid according to XML 1.0. Commonly this happens when the S3 object key contains ASCII control characters.

Regression Issue

  • [ ] Select this option if this issue appears to be a regression.

Expected Behavior

The response should be processed without error.

Current Behavior

An exception is raised:

Traceback (most recent call last):
  File "awscli/botocore/parsers.py", line 537, in _parse_xml_string_to_dom
xml.etree.ElementTree.ParseError: reference to invalid character number: line 2, column 122

Reproduction Steps

Using awscli, run a command like the following:

aws s3api create-multipart-upload --bucket mdg-test-20250618 --key $'test\u0007.txt' --debug

Possible Solution

  1. Extend AWS S3 service to not generate invalid XML in the CreateMultipartUpload response.
  2. Modify botocore to be more relaxed in its XML parsing code.

Additional Information/Context

debug.txt

SDK version used

aws-cli/2.27.37

Environment details (OS name and version, etc.)

Gentoo Linux

floppym avatar Jun 18 '25 17:06 floppym

Hi @floppym, thanks for reaching out.\u0007 (bell) is an ASCII control character that is prohibited in XML 1.0 documents https://www.w3.org/TR/xml/#charsets. Also, this requires special handling as per S3 User Guide https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html and recommends avoiding control characters. This causes problems when included in XML API responses as standard XML parsers reject them. Please let me know if you have any questions. Thanks.

adev-code avatar Jun 20 '25 18:06 adev-code

Yes, I am aware of that.

botocore should use an XML parser that is compatible with the bogus XML produced by the AWS S3 service.

floppym avatar Jun 20 '25 22:06 floppym

Alternatively, the AWS service should be changed to encode the key in an XML-compatible way, or reject the key altogether. Producing a broken XML response seems wrong.

If you can direct me to where I can report this issue to the AWS S3 folks, I would be happy to do that.

floppym avatar Jun 20 '25 22:06 floppym

Thanks for the reply. The key \u0007 results in a ParseError from Python which shows that it is not compatible. S3 documentation mentions "characters to avoid" to prevent these cases. It is the user's responsibility to handle these characters appropriately. The team has decided not to support this.

adev-code avatar Sep 16 '25 20:09 adev-code

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

github-actions[bot] avatar Sep 16 '25 20:09 github-actions[bot]