starlette icon indicating copy to clipboard operation
starlette copied to clipboard

Issue with MultiPartParser class

Open aretasg opened this issue 5 years ago • 9 comments

Checklist

  • [x] The bug is reproducible against the latest release and/or master.
  • [X] There are no similar issues or pull requests to fix it yet.

Describe the bug

MultiPartParser class fails to parse a request body produced by the client but not Postman (or similar).

To reproduce

from starlette.formparsers import MultiPartParser, FormParser
from starlette.datastructures import Headers
import asyncio


client_body = b'--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
client_headers = Headers({'accept-encoding': 'gzip,deflate', 'content-length': '39940', 'content-type': 'multipart/form-data; boundary=xoxoxoxoxoxoxoxoxoxo_1600710720987', 'host': '10.0.5.13:80', 'connection': 'Keep-Alive', 'user-agent': 'KNIME Uploader'})

postman_body = b'----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'
postman_headers = Headers({'content-type': 'multipart/form-data; boundary=--------------------------850116600781883365617864', 'user-agent': 'PostmanRuntime/7.26.0', 'accept': '*/*', 'cache-control': 'no-cache', 'host': '10.0.5.13:80', 'accept-encoding': 'gzip, deflate, br', 'connection': 'keep-alive', 'content-length': '2455'})


async def stream(message):

    body = message.get("body", b"")
    if body:
        yield body

async def client_test_multipart():

    parser = MultiPartParser(client_headers, stream({"body": client_body}))
    form = await parser.parse()
    print([(k, v) for k, v in form.items()])
    return form

async def postman_test_multipart():

    parser = MultiPartParser(postman_headers, stream({"body": postman_body}))
    form = await parser.parse()
    print([(k, v) for k, v in form.items()])
    return form

loop = asyncio.get_event_loop()
loop.run_until_complete(client_test_multipart())
loop.run_until_complete(postman_test_multipart())

Expected behavior

client_test_multipart should read and return two files from the client_body as with postman_test_multipart and postman_body.

Actual behavior

client_test_multipart reads only one file from the request body produced by the client.

Debugging material

[('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1a20>)]
[('attributes', <starlette.datastructures.UploadFile object at 0x7f091c0b1a58>), ('fasta', <starlette.datastructures.UploadFile object at 0x7f091c0b1828>)]

Environment

  • OS: Linux
  • Python version: 3.8
  • Starlette version: 0.13.6

aretasg avatar Sep 21 '20 21:09 aretasg

I have found the issue.

In the client_body, after the first file (Content-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n) there is a \n character instead of \r\n.

As per HTTP convention, \r\n should be used. However, this is a standard and not a rule and it would be nice if the MultiPartParser would parse both types of line breaks (as some other frameworks do) or at least throw an appropriate exception instead of an ambiguous 422 when using UploadFile.

In the meantime this is what I have managed to write to allow the both files to be parsed from the request:

@app.post("/upload")
async def upload(request: Request):

    body = await request.body()
    request._body = body.replace(b'\r\n', b'\n').replace(b'\n', b'\r\n')

    inp = await request.form()
    fasta = inp["fasta"]
    attributes = inp["attributes"]

aretasg avatar Sep 21 '20 22:09 aretasg

Okay, so a useful point to start with would be double checking how other frameworks handle the malformed input here. Eg. what does Flask do? (I think you could try posting the request to https://httpbin.org/post to find out?)

What client are you using?

lovelydinosaur avatar Sep 22 '20 08:09 lovelydinosaur

Hi @tomchristie ! Thank you for replying.

So before Starlette the service was using Flask and there were no issues - so Flask can handle the 'malformed' input.

I have also tried issuing a POST with the 'malformed' body to https://httpbin.org/post using this:

import http.client
import mimetypes


conn = http.client.HTTPSConnection("httpbin.org")

client_boundary = 'xoxoxoxoxoxoxoxoxoxo_1600710720987'
postman_boundary = '--------------------------850116600781883365617864'

client_payload = '--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="fasta"; filename="sequences.fasta"\r\nContent-Type: multipart/form-data\r\n\r\n>P23A07_IgM-1047:H:Q10C92:17/22:NID2093\nCAGGTTTCAGTAG\n--xoxoxoxoxoxoxoxoxoxo_1600710720987\r\nContent-Disposition: form-data; name="attributes"; filename="attributes.tsv"\r\nContent-Type: multipart/form-data\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n--xoxoxoxoxoxoxoxoxoxo_1600710720987--\r\n'
postman_payload = '----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="attributes"; filename="test-attribute_5.tsv"\r\nContent-Type: text/tab-separated-values\r\n\r\n"Campaign ID"\t"Plate Set ID"\t"No"\n\r\n----------------------------850116600781883365617864\r\nContent-Disposition: form-data; name="fasta"; filename="test-sequence_correct_5.fasta"\r\nContent-Type: application/octet-stream\r\n\r\n>P23G01_IgG1-1411:H:Q10C3:1/1:NID18\r\nCAGGTATTGAA\r\n\r\n----------------------------850116600781883365617864--\r\n'

client_headers = {
   'Content-type': 'multipart/form-data; boundary={}'.format(client_boundary)
}
postman_headers = {
   'Content-type': 'multipart/form-data; boundary={}'.format(postman_boundary)
}

conn.request("POST", "/post", client_payload, client_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))

conn.request("POST", "/post", postman_payload, postman_headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))

It returned both files and does recognise non-standard line breaks in the body.

The client is a KNIME pipeline and throughout execution sends HTTP calls using Palladian nodes. I have no access to the client code so I had to look for ways to overcome the issue using Starlette instead.

aretasg avatar Sep 22 '20 10:09 aretasg

I would be interested to make a PR on this but can't decide whether it should be here. Knowing that Starlette uses python-multipart library for parsing the request perhaps a PR there would be more appropriate?

aretasg avatar Sep 23 '20 15:09 aretasg

@tomchristie any update on this ? python-multipart doesn't seem to address this issue, however, this is quite blocking when using the UploadFile from FastAPI.

I'm using the requests library from python and I cannot manage to upload files, I always get a 422 Unprocessable Entity as response.

#1514 seemed to address this issue nicely given that the baize library seems much more active than python-multipart library. If this solution doesn't suits you, what else are you planning to do ?

Does anyone currently have a workaround to use UploadFile from FastAPI and python requests as client ?

Thanks in advance for your reply

hall-b avatar Jun 02 '22 09:06 hall-b

@tomchristie any update on this ? python-multipart doesn't seem to address this issue, however, this is quite blocking when using the UploadFile from FastAPI.

I'm using the requests library from python and I cannot manage to upload files, I always get a 422 Unprocessable Entity as response.

#1514 seemed to address this issue nicely given that the baize library seems much more active than python-multipart library. If this solution doesn't suits you, what else are you planning to do ?

Does anyone currently have a workaround to use UploadFile from FastAPI and python requests as client ?

Thanks in advance for your reply

I still find it strange that after all this time since the issue has been raised the maintainers of python-multipart do not to even comment on the issue.

@hall-b python-multipart is only used for uploading multiple files at once. Perhaps you can do it one file at a time? If not, do try modifying the body of the request to make it standard as I did here.

aretasg avatar Jun 05 '22 16:06 aretasg

just swap python-multipart to baize

h0rn3t avatar Aug 28 '22 10:08 h0rn3t

I have the same problem with downloading files as the author. An stupid single-threaded cycle inside python-multipart kills all the performance of the library.

artyom-ace avatar Sep 09 '22 06:09 artyom-ace

I have the same problem with downloading files as the author. An stupid single-threaded cycle inside python-multipart kills all the performance of the library.

yup i saw that code, there are two of these


        # For each byte...
        while i < length:
            c = data[i]

            if state == STATE_START:
                # Skip leading newlines
                if c == CR or c == LF:
                    i += 1
                    self.logger.debug("Skipping leading CR/LF at %d", i)
                    continue

                # index is used as in index into our boundary.  Set to 0.
                index = 0

h0rn3t avatar Sep 09 '22 07:09 h0rn3t

up

h0rn3t avatar Sep 29 '22 07:09 h0rn3t

We've decided to vendor pyhon-multipart into Starlette - only the code that makes sense.

Kludex avatar Jan 31 '23 18:01 Kludex

We've decided to vendor pyhon-multipart into Starlette - only the code that makes sense.

This decision was postponed.

Kludex avatar Feb 23 '23 10:02 Kludex

PR welcome on python-multipart to solve this. Please refer to https://github.com/andrew-d/python-multipart/issues/31.

Kludex avatar Feb 27 '23 16:02 Kludex

I've come across this issue when writing a test for endpoint that accepts a file. Because I am in control of the payload in this case, this is a client side workaround that worked for me - using requests to format the request body:

import requests

with open(file_path, "rb") as f:
    body, content_type = requests.PreparedRequest()._encode_files(files={"file": f}, data={})
response = test_client.post("/document/new", content=body, headers={"Content-Type": content_type})

martin-kokos avatar Jul 06 '23 12:07 martin-kokos