Unable to parse S3 CreateMultipartUpload response with invalid XML characters
Describe the bug
botocore fails to process an S3 CreateMultipartUpload request when the response contains characters which are invalid according to XML 1.0. Commonly this happens when the S3 object key contains ASCII control characters.
Regression Issue
- [ ] Select this option if this issue appears to be a regression.
Expected Behavior
The response should be processed without error.
Current Behavior
An exception is raised:
Traceback (most recent call last):
File "awscli/botocore/parsers.py", line 537, in _parse_xml_string_to_dom
xml.etree.ElementTree.ParseError: reference to invalid character number: line 2, column 122
Reproduction Steps
Using awscli, run a command like the following:
aws s3api create-multipart-upload --bucket mdg-test-20250618 --key $'test\u0007.txt' --debug
Possible Solution
- Extend AWS S3 service to not generate invalid XML in the CreateMultipartUpload response.
- Modify botocore to be more relaxed in its XML parsing code.
Additional Information/Context
SDK version used
aws-cli/2.27.37
Environment details (OS name and version, etc.)
Gentoo Linux
Hi @floppym, thanks for reaching out.\u0007 (bell) is an ASCII control character that is prohibited in XML 1.0 documents https://www.w3.org/TR/xml/#charsets. Also, this requires special handling as per S3 User Guide https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-keys.html and recommends avoiding control characters. This causes problems when included in XML API responses as standard XML parsers reject them. Please let me know if you have any questions. Thanks.
Yes, I am aware of that.
botocore should use an XML parser that is compatible with the bogus XML produced by the AWS S3 service.
Alternatively, the AWS service should be changed to encode the key in an XML-compatible way, or reject the key altogether. Producing a broken XML response seems wrong.
If you can direct me to where I can report this issue to the AWS S3 folks, I would be happy to do that.
Thanks for the reply. The key \u0007 results in a ParseError from Python which shows that it is not compatible. S3 documentation mentions "characters to avoid" to prevent these cases. It is the user's responsibility to handle these characters appropriately. The team has decided not to support this.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.