python-pyodata icon indicating copy to clipboard operation
python-pyodata copied to clipboard

Fix buffer size when binary data is returned in multipart response

Open bartonip opened this issue 4 years ago • 4 comments

When binary data is returned in a response, len is not sufficient to determine the required buffer size. getsizeof has been used instead to provide a more accurate buffer value.

bartonip avatar Feb 22 '21 06:02 bartonip

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

CLAassistant avatar Feb 22 '21 06:02 CLAassistant

Apart from the line comment, your change fails builds because of linter failure.

Relevant for you is the: ************* Module pyodata.v2.service pyodata/v2/service.py:143: [C0301(line-too-long), ] Line too long (134/120)

the getsizeof will give 'big enough' value to read the whole content

Sadly, together with your change came new version of pylint, so other two failures are not relevant, but new rules applied to old code.

I have fixed this in PR #146, so now the upgrade of linters will happen deterministically and with its own build-per-PR. Pls pull from master into your branch , so the build will pass and we can merge the PR. You can run the linter locally with pip install -r dev-requirements.txt and pylint --rcfile=.pylintrc --output-format=parseable --reports=no pyodata

phanak-sap avatar Feb 22 '21 07:02 phanak-sap

Hi @bartonip, I'd like to anwer #issuecomment-1011475876 here, because it's related to this issue.

The issue with len vs getsizeof is really that the HTTP response itself at the lowest level is a byte stream that gets translated into a string by the requests library.

I agree, that's probably a bug, that will truncate the response body and I think this will happen, when there are many multibyte characters in the payload. The len is taken including the header, so the bug will be effective, if the extra bytes of the multibyte characters are more than the size of the header.

My understanding of what happens is that in the case of say, image data that is sent as pure binary (not as a base64 string represented as binary) when you run len on that resultant byte string the count will be too small due to bytes like \x00 not actually counting as a character in a string.

However, if you receive arbitrary binary data, you will have "surprises" anyway. Like you've noted before, the data is decoded as UTF-8 (actually I think this happens in pyodata, not in the requests library). Not every arbitrary series of bytes is valid UTF-8 (e.g. 0xffis not) and you will see conversion errors.

The OData spec knows different primitive types and defines a specific encoding for each of them. There is no type, that permits the server to send arbitrary binary data (except when adressing the $value property and I don't think, pyodata supports that).

As the affected lines of code are called through a few indirections, it's really guesswork to say, how you could run into this bug. So if you did, could you boil this down to a reproducer or even a testcase?

inxonic avatar Jan 17 '22 21:01 inxonic

Yep the situation where I've run into receiving pure binary data is receiving it through the $value property.

The encoding of the string becomes irrelevant when using getsizeof because, unless I am mistaken, it measures the size of the region of memory the variable is stored in. So even though there are invalid bytes for UTF-8 like 0xff, the byte is still factored into the size because it is in that region of memory. This is the best recollection I can give you from when I was last using the pyodata package almost a year ago so I may be wrong.

I will work on a test case this weekend, although no promises as I am not familiar with this codebase anymore.

bartonip avatar Jan 17 '22 21:01 bartonip

Hi @bartonip I understand that you do not want to work on this PR anymore.

But since you came into contact with this bug - and seems still valid for me - could you please provide if not test that reproduces the problem, at least attach a file with response that creates problem with the response.read(len(data)).

I was not able to reproduce your problem and seems @inxonic was not able as well.

phanak-sap avatar Jul 14 '23 07:07 phanak-sap