Issue with MultiPartParser class
Checklist
- [x] The bug is reproducible against the latest release and/or
master. - [X] There are no similar issues or pull requests to fix it yet.
Describe the bug
MultiPartParser class fails to parse a request body produced by the client but not Postman (or similar).
To reproduce
from starlette.formparsers import MultiPartParser, FormParser
from starlette.datastructures import Headers
import asyncio
client_body = b'--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
client_headers = Headers({'accept-encoding': 'gzip,deflate', 'content-length': '39940', 'content-type': 'multipart/form-data; boundary=xoxoxoxoxoxoxoxoxoxo_1600710720987', 'host': '10.0.5.13:80', 'connection': 'Keep-Alive', 'user-agent': 'KNIME Uploader'})
postman_body = b'----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'
postman_headers = Headers({'content-type': 'multipart/form-data; boundary=--------------------------850116600781883365617864', 'user-agent': 'PostmanRuntime/7.26.0', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '10.0.5.13:80', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '2455'})
async def stream(message):
body = message.get("body", b"")
if body:
yield body
async def client_test_multipart():
parser = MultiPartParser(client_headers, stream({"body": client_body}))
form = await parser.parse()
print([(k, v) for k, v in form.items()])
return form
async def postman_test_multipart():
parser = MultiPartParser(postman_headers, stream({"body": postman_body}))
form = await parser.parse()
print([(k, v) for k, v in form.items()])
return form
loop = asyncio.get_event_loop()
loop.run_until_complete(client_test_multipart())
loop.run_until_complete(postman_test_multipart())
Expected behavior
client_test_multipart should read and return two files from the client_body as with postman_test_multipart and postman_body.
Actual behavior
client_test_multipart reads only one file from the request body produced by the client.
Debugging material
[('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1a20>)]
[('attributes', <starlette.datastructures.UploadFile object at 0x7f091c0b1a58>), ('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1828>)]
Environment
- OS: Linux
- Python version: 3.8
- Starlette version: 0.13.6
I have found the issue.
In the client_body, after the first file (Content-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n) there is a \n character instead of \r\n.
As per HTTP convention, \r\n should be used. However, this is a standard and not a rule and it would be nice if the MultiPartParser would parse both types of line breaks (as some other frameworks do) or at least throw an appropriate exception instead of an ambiguous 422 when using UploadFile.
In the meantime this is what I have managed to write to allow the both files to be parsed from the request:
@app.post("/upload")
async def upload(request: Request):
body = await request.body()
request._body = body.replace(b'\r\n', b'\n').replace(b'\n', b'\r\n')
inp = await request.form()
fasta = inp["fasta"]
attributes = inp["attributes"]
Okay, so a useful point to start with would be double checking how other frameworks handle the malformed input here.
Eg. what does Flask do? (I think you could try posting the request to https://httpbin.org/post to find out?)
What client are you using?
Hi @tomchristie ! Thank you for replying.
So before Starlette the service was using Flask and there were no issues - so Flask can handle the 'malformed' input.
I have also tried issuing a POST with the 'malformed' body to https://httpbin.org/post using this:
import http.client
import mimetypes
conn = http.client.HTTPSConnection("httpbin.org")
client_boundary = 'xoxoxoxoxoxoxoxoxoxo_1600710720987'
postman_boundary = '--------------------------850116600781883365617864'
client_payload = '--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
postman_payload = '----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'
client_headers = {
'Content-type': 'multipart/form-data; boundary={}'.format(client_boundary)
}
postman_headers = {
'Content-type': 'multipart/form-data; boundary={}'.format(postman_boundary)
}
conn.request("POST", "/post", client_payload, client_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
conn.request("POST", "/post", postman_payload, postman_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
It returned both files and does recognise non-standard line breaks in the body.
The client is a KNIME pipeline and throughout execution sends HTTP calls using Palladian nodes. I have no access to the client code so I had to look for ways to overcome the issue using Starlette instead.
I would be interested to make a PR on this but can't decide whether it should be here. Knowing that Starlette uses python-multipart library for parsing the request perhaps a PR there would be more appropriate?
@tomchristie any update on this ? python-multipart doesn't seem to address this issue, however, this is quite blocking when using the UploadFile from FastAPI.
I'm using the requests library from python and I cannot manage to upload files, I always get a 422 Unprocessable Entity as response.
#1514 seemed to address this issue nicely given that the baize library seems much more active than python-multipart library. If this solution doesn't suits you, what else are you planning to do ?
Does anyone currently have a workaround to use UploadFile from FastAPI and python requests as client ?
Thanks in advance for your reply
@tomchristie any update on this ?
python-multipartdoesn't seem to address this issue, however, this is quite blocking when using theUploadFilefrom FastAPI.I'm using the requests library from python and I cannot manage to upload files, I always get a
422 Unprocessable Entityas response.#1514 seemed to address this issue nicely given that the
baizelibrary seems much more active thanpython-multipartlibrary. If this solution doesn't suits you, what else are you planning to do ?Does anyone currently have a workaround to use
UploadFilefrom FastAPI and pythonrequestsas client ?Thanks in advance for your reply
I still find it strange that after all this time since the issue has been raised the maintainers of python-multipart do not to even comment on the issue.
@hall-b python-multipart is only used for uploading multiple files at once. Perhaps you can do it one file at a time? If not, do try modifying the body of the request to make it standard as I did here.
just swap python-multipart to baize
I have the same problem with downloading files as the author. An stupid single-threaded cycle inside python-multipart kills all the performance of the library.
I have the same problem with downloading files as the author. An stupid single-threaded cycle inside python-multipart kills all the performance of the library.
yup i saw that code, there are two of these
# For each byte...
while i < length:
c = data[i]
if state == STATE_START:
# Skip leading newlines
if c == CR or c == LF:
i += 1
self.logger.debug("Skipping leading CR/LF at %d", i)
continue
# index is used as in index into our boundary. Set to 0.
index = 0
up
We've decided to vendor pyhon-multipart into Starlette - only the code that makes sense.
We've decided to vendor
pyhon-multipartinto Starlette - only the code that makes sense.
This decision was postponed.
PR welcome on python-multipart to solve this. Please refer to https://github.com/andrew-d/python-multipart/issues/31.
I've come across this issue when writing a test for endpoint that accepts a file. Because I am in control of the payload in this case, this is a client side workaround that worked for me - using requests to format the request body:
import requests
with open(file_path, "rb") as f:
body, content_type = requests.PreparedRequest()._encode_files(files={"file": f}, data={})
response = test_client.post("/document/new", content=body, headers={"Content-Type": content_type})