aiohttp icon indicating copy to clipboard operation
aiohttp copied to clipboard

Multipart base64 encoding as RFC 2045

Open alex-eri opened this issue 8 years ago • 20 comments

What do these changes do?

Fixes #2087 Trivial, but need discussion.

https://docs.python.org/3/library/base64.html#base64.encodebytes

Are there changes in behavior for the user?

No affected for user, but can broke some side apps:

Maybe unexpected '\n' because http exprcts '\r\n' .

Lose performance

>>> timeit.timeit('base64.b64encode(data)','import base64;data= b"i"*1000')
3.0879702609963715
>>> timeit.timeit('base64.encodebytes(data)','import base64;data= b"i"*1000')
15.915579186999821

Related issue number

#2087

Checklist

  • [x] I think the code is well written
  • [ ] Unit tests for the changes exist
  • [ ] Documentation reflects the changes
  • [ ] If you provide code modification, please add yourself to CONTRIBUTORS.txt
    • The format is <Name> <Surname>.
    • Please keep alphabetical order, the file is sorted by names.
  • [ ] Add a new news fragment into the changes folder
    • name it <issue_id>.<type> for example (588.bug)
    • if you don't have an issue_id change it to the pr id after creating the pr
    • ensure type is one of the following:
      • .feature: Signifying a new feature.
      • .bugfix: Signifying a bug fix.
      • .doc: Signifying a documentation improvement.
      • .removal: Signifying a deprecation or removal of public API.
      • .misc: A ticket has been closed, but it is not of interest to users.
    • Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."

alex-eri avatar Jul 24 '17 22:07 alex-eri

@kxepal thoughts?

fafhrd91 avatar Jul 24 '17 22:07 fafhrd91

@manuco one line for fix. You can make regression tests

alex-eri avatar Jul 24 '17 23:07 alex-eri

I suggest to NOT merge it besause of

>>> timeit.timeit('base64.b64encode(data)','import base64;data= b"i"*1000')
3.0879702609963715
>>> timeit.timeit('base64.encodebytes(data)','import base64;data= b"i"*1000')
15.915579186999821

alex-eri avatar Jul 24 '17 23:07 alex-eri

does speed matter for this code?

fafhrd91 avatar Jul 24 '17 23:07 fafhrd91

This kind of Multipart is often big amount of data (mjpeg video stream for example). It difficult to cache on proxy. So speed preferred then readability.

updated topic

alex-eri avatar Jul 24 '17 23:07 alex-eri

Base64 is not designed to be used on pure http streams : http can handle binary streams. If speed matters, on big data like mpeg streams, base64 should't be considered as a good transport encoding...

IMHO, the speed problem is not a major one.

edit: typo

manuco avatar Jul 26 '17 09:07 manuco

@alex-eri: what's the best workflow to include the unit test, and some others fixes for this problem on the code ?

  • Should I fork + PR your repo, or should I fork + PR the main repo ? Or should I just push on your branch once you allow me to do that ?

manuco avatar Jul 26 '17 09:07 manuco

I agree with @manuco that speed is not a concern here: the base64 encoding is not the default one when sending a multipart request, and you have to explicitly enable it. Sending huge files encoded in base64 is a waste of resources anyway.

The 6x overhead for encodebytes over b64encode is insane though, opening a bug report against Python for this might be a good idea

arthurdarcet avatar Jul 26 '17 10:07 arthurdarcet

@manuco i dont git pro =) Dont know the best way. You can create new PR. I close this

https://github.com/alex-eri/aiohttp/invitations

alex-eri avatar Jul 26 '17 11:07 alex-eri

I've pushed my proposal on your repo on the branch patch-5. If everyone accept it, you can create a Pull Request for merging into the main repo, and close #2128 and #2087.

manuco avatar Jul 26 '17 13:07 manuco

u are shure for?

  •            enc_chunk = enc_chunk.replace(b"\n", b"\r\n")
    

imaybe is there better way to do this?

what would happen if concatenate two or more chunks (one with tail < 76 chars)? one string with longer or less length.

manually splitting it maybe

alex-eri avatar Jul 26 '17 13:07 alex-eri

Codecov Report

:exclamation: No coverage uploaded for pull request base (master@4b3a609). Click here to learn what that means. The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master    #2128   +/-   ##
=========================================
  Coverage          ?   97.08%           
=========================================
  Files             ?       38           
  Lines             ?     7715           
  Branches          ?     1343           
=========================================
  Hits              ?     7490           
  Misses            ?      102           
  Partials          ?      123
Impacted Files Coverage Δ
aiohttp/multipart.py 95.57% <75%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4b3a609...ab0a228. Read the comment docs.

codecov-io avatar Jul 26 '17 13:07 codecov-io

Ha! I see the problem:

  • with multiple write(), there will be output lines that will not be at the maximum size, or even contains a padding in the middle of the string. I missed that, and now I understand the work on the "multiple of 3" split stuff.

I'll have to write some test case testing this case, and the code of it. For today, I don't have any more time for this. I'll do this (I hope) before the end of the week.

Is it allowed to rewrite the branch history ?

manuco avatar Jul 26 '17 14:07 manuco

For the replace() stuff, it only touches the encoded part of the stream. So wathever the input content is, it can't be converted with "\n" in the base64 format. So we can safely transform it as long as we don't used any of the base64 alphabet.

But since this first implementation has the issue seen earlier, I'll try a smarter approach.

manuco avatar Jul 26 '17 14:07 manuco

Is it allowed to rewrite the branch history ?

Do what you want, it's your branch :) Even git push --force is allowed.

asvetlov avatar Jul 27 '17 08:07 asvetlov

Why test is failed?

asvetlov avatar Aug 10 '17 09:08 asvetlov

The test shows that when one stream reader reads multiple chunks, the produced base64 document is not valid. It means that my previous code doesn't work well. It was my first step about producing a better solution for base64 :

  1. create a test that shows the problem
  2. fix the problem

The next step will be done tomorrow or the next week (i'm busy with many other things to do).

manuco avatar Aug 10 '17 16:08 manuco

What is the status of PR?

asvetlov avatar Feb 27 '18 09:02 asvetlov

Still on my todolist (but not on the top) but I miss time to complete it for now. I'll try to prioritize it for the next weeks.

manuco avatar Feb 27 '18 13:02 manuco

Good

asvetlov avatar Feb 27 '18 13:02 asvetlov