starlette Issue with MultiPartParser class

Checklist

[x] The bug is reproducible against the latest release and/or master.
[X] There are no similar issues or pull requests to fix it yet.

Describe the bug

MultiPartParser class fails to parse a request body produced by the client but not Postman (or similar).

To reproduce

from starlette.formparsers import MultiPartParser, FormParser
from starlette.datastructures import Headers
import asyncio


client_body = b'--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
client_headers = Headers({'accept-encoding': 'gzip,deflate', 'content-length': '39940', 'content-type': 'multipart/form-data; boundary=xoxoxoxoxoxoxoxoxoxo_1600710720987', 'host': '10.0.5.13:80', 'connection': 'Keep-Alive', 'user-agent': 'KNIME Uploader'})

postman_body = b'----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'
postman_headers = Headers({'content-type': 'multipart/form-data; boundary=--------------------------850116600781883365617864', 'user-agent': 'PostmanRuntime/7.26.0', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '10.0.5.13:80', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '2455'})


async def stream(message):

    body = message.get("body", b"")
    if body:
        yield body

async def client_test_multipart():

    parser = MultiPartParser(client_headers, stream({"body": client_body}))
    form = await parser.parse()
    print([(k, v) for k, v in form.items()])
    return form

async def postman_test_multipart():

    parser = MultiPartParser(postman_headers, stream({"body": postman_body}))
    form = await parser.parse()
    print([(k, v) for k, v in form.items()])
    return form

loop = asyncio.get_event_loop()
loop.run_until_complete(client_test_multipart())
loop.run_until_complete(postman_test_multipart())

Expected behavior

client_test_multipart should read and return two files from the client_body as with postman_test_multipart and postman_body.

Actual behavior

client_test_multipart reads only one file from the request body produced by the client.

Debugging material

[('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1a20>)]
[('attributes', <starlette.datastructures.UploadFile object at 0x7f091c0b1a58>), ('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1828>)]

Environment

OS: Linux
Python version: 3.8
Starlette version: 0.13.6

Sep 21 '20 21:09 aretasg

I have found the issue.

In the client_body, after the first file (Content-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n) there is a \n character instead of \r\n.

As per HTTP convention, \r\n should be used. However, this is a standard and not a rule and it would be nice if the MultiPartParser would parse both types of line breaks (as some other frameworks do) or at least throw an appropriate exception instead of an ambiguous 422 when using UploadFile.

In the meantime this is what I have managed to write to allow the both files to be parsed from the request:

@app.post("/upload")
async def upload(request: Request):

    body = await request.body()
    request._body = body.replace(b'\r\n', b'\n').replace(b'\n', b'\r\n')

    inp = await request.form()
    fasta = inp["fasta"]
    attributes = inp["attributes"]

Sep 21 '20 22:09 aretasg

Okay, so a useful point to start with would be double checking how other frameworks handle the malformed input here. Eg. what does Flask do? (I think you could try posting the request to https://httpbin.org/post to find out?)

What client are you using?

Sep 22 '20 08:09 lovelydinosaur

Hi @tomchristie ! Thank you for replying.

So before Starlette the service was using Flask and there were no issues - so Flask can handle the 'malformed' input.

I have also tried issuing a POST with the 'malformed' body to https://httpbin.org/post using this:

import http.client
import mimetypes


conn = http.client.HTTPSConnection("httpbin.org")

client_boundary = 'xoxoxoxoxoxoxoxoxoxo_1600710720987'
postman_boundary = '--------------------------850116600781883365617864'

client_payload = '--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
postman_payload = '----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'

client_headers = {
   'Content-type': 'multipart/form-data; boundary={}'.format(client_boundary)
}
postman_headers = {
   'Content-type': 'multipart/form-data; boundary={}'.format(postman_boundary)
}

conn.request("POST", "/post", client_payload, client_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))

conn.request("POST", "/post", postman_payload, postman_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))

It returned both files and does recognise non-standard line breaks in the body.

The client is a KNIME pipeline and throughout execution sends HTTP calls using Palladian nodes. I have no access to the client code so I had to look for ways to overcome the issue using Starlette instead.

Sep 22 '20 10:09 aretasg

I would be interested to make a PR on this but can't decide whether it should be here. Knowing that Starlette uses python-multipart library for parsing the request perhaps a PR there would be more appropriate?

Sep 23 '20 15:09 aretasg

@tomchristie any update on this ? python-multipart doesn't seem to address this issue, however, this is quite blocking when using the UploadFile from FastAPI.

I'm using the requests library from python and I cannot manage to upload files, I always get a 422 Unprocessable Entity as response.

#1514 seemed to address this issue nicely given that the baize library seems much more active than python-multipart library. If this solution doesn't suits you, what else are you planning to do ?

Does anyone currently have a workaround to use UploadFile from FastAPI and python requests as client ?

Thanks in advance for your reply

Jun 02 '22 09:06 hall-b

@tomchristie any update on this ? python-multipart doesn't seem to address this issue, however, this is quite blocking when using the UploadFile from FastAPI.

I'm using the requests library from python and I cannot manage to upload files, I always get a 422 Unprocessable Entity as response.

#1514 seemed to address this issue nicely given that the baize library seems much more active than python-multipart library. If this solution doesn't suits you, what else are you planning to do ?

Does anyone currently have a workaround to use UploadFile from FastAPI and python requests as client ?

Thanks in advance for your reply

I still find it strange that after all this time since the issue has been raised the maintainers of python-multipart do not to even comment on the issue.

@hall-b python-multipart is only used for uploading multiple files at once. Perhaps you can do it one file at a time? If not, do try modifying the body of the request to make it standard as I did here.

Jun 05 '22 16:06 aretasg

just swap python-multipart to baize

Aug 28 '22 10:08 h0rn3t

I have the same problem with downloading files as the author. An stupid single-threaded cycle inside python-multipart kills all the performance of the library.

Sep 09 '22 06:09 artyom-ace

I have the same problem with downloading files as the author. An stupid single-threaded cycle inside python-multipart kills all the performance of the library.

yup i saw that code, there are two of these


        # For each byte...
        while i < length:
            c = data[i]

            if state == STATE_START:
                # Skip leading newlines
                if c == CR or c == LF:
                    i += 1
                    self.logger.debug("Skipping leading CR/LF at %d", i)
                    continue

                # index is used as in index into our boundary.  Set to 0.
                index = 0

Sep 09 '22 07:09 h0rn3t

up

Sep 29 '22 07:09 h0rn3t

We've decided to vendor pyhon-multipart into Starlette - only the code that makes sense.

Jan 31 '23 18:01 Kludex

We've decided to vendor pyhon-multipart into Starlette - only the code that makes sense.

This decision was postponed.

Feb 23 '23 10:02 Kludex

PR welcome on python-multipart to solve this. Please refer to https://github.com/andrew-d/python-multipart/issues/31.

Feb 27 '23 16:02 Kludex

I've come across this issue when writing a test for endpoint that accepts a file. Because I am in control of the payload in this case, this is a client side workaround that worked for me - using requests to format the request body:

import requests

with open(file_path, "rb") as f:
    body, content_type = requests.PreparedRequest()._encode_files(files={"file": f}, data={})
response = test_client.post("/document/new", content=body, headers={"Content-Type": content_type})

Jul 06 '23 12:07 martin-kokos