Multipart base64 encoding as RFC 2045
What do these changes do?
Fixes #2087 Trivial, but need discussion.
https://docs.python.org/3/library/base64.html#base64.encodebytes
Are there changes in behavior for the user?
No affected for user, but can broke some side apps:
Maybe unexpected '\n' because http exprcts '\r\n' .
Lose performance
>>> timeit.timeit('base64.b64encode(data)','import base64;data= b"i"*1000')
3.0879702609963715
>>> timeit.timeit('base64.encodebytes(data)','import base64;data= b"i"*1000')
15.915579186999821
Related issue number
#2087
Checklist
- [x] I think the code is well written
- [ ] Unit tests for the changes exist
- [ ] Documentation reflects the changes
- [ ] If you provide code modification, please add yourself to
CONTRIBUTORS.txt- The format is <Name> <Surname>.
- Please keep alphabetical order, the file is sorted by names.
- [ ] Add a new news fragment into the
changesfolder- name it
<issue_id>.<type>for example (588.bug) - if you don't have an
issue_idchange it to the pr id after creating the pr - ensure type is one of the following:
.feature: Signifying a new feature..bugfix: Signifying a bug fix..doc: Signifying a documentation improvement..removal: Signifying a deprecation or removal of public API..misc: A ticket has been closed, but it is not of interest to users.
- Make sure to use full sentences with correct case and punctuation, for example: "Fix issue with non-ascii contents in doctest text files."
- name it
@kxepal thoughts?
@manuco one line for fix. You can make regression tests
I suggest to NOT merge it besause of
>>> timeit.timeit('base64.b64encode(data)','import base64;data= b"i"*1000')
3.0879702609963715
>>> timeit.timeit('base64.encodebytes(data)','import base64;data= b"i"*1000')
15.915579186999821
does speed matter for this code?
This kind of Multipart is often big amount of data (mjpeg video stream for example). It difficult to cache on proxy. So speed preferred then readability.
updated topic
Base64 is not designed to be used on pure http streams : http can handle binary streams. If speed matters, on big data like mpeg streams, base64 should't be considered as a good transport encoding...
IMHO, the speed problem is not a major one.
edit: typo
@alex-eri: what's the best workflow to include the unit test, and some others fixes for this problem on the code ?
- Should I fork + PR your repo, or should I fork + PR the main repo ? Or should I just push on your branch once you allow me to do that ?
I agree with @manuco that speed is not a concern here: the base64 encoding is not the default one when sending a multipart request, and you have to explicitly enable it. Sending huge files encoded in base64 is a waste of resources anyway.
The 6x overhead for encodebytes over b64encode is insane though, opening a bug report against Python for this might be a good idea
@manuco i dont git pro =) Dont know the best way. You can create new PR. I close this
https://github.com/alex-eri/aiohttp/invitations
I've pushed my proposal on your repo on the branch patch-5. If everyone accept it, you can create a Pull Request for merging into the main repo, and close #2128 and #2087.
u are shure for?
-
enc_chunk = enc_chunk.replace(b"\n", b"\r\n")
imaybe is there better way to do this?
what would happen if concatenate two or more chunks (one with tail < 76 chars)? one string with longer or less length.
manually splitting it maybe
Codecov Report
:exclamation: No coverage uploaded for pull request base (
master@4b3a609). Click here to learn what that means. The diff coverage is75%.
@@ Coverage Diff @@
## master #2128 +/- ##
=========================================
Coverage ? 97.08%
=========================================
Files ? 38
Lines ? 7715
Branches ? 1343
=========================================
Hits ? 7490
Misses ? 102
Partials ? 123
| Impacted Files | Coverage Δ | |
|---|---|---|
| aiohttp/multipart.py | 95.57% <75%> (ø) |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact),ø = not affected,? = missing dataPowered by Codecov. Last update 4b3a609...ab0a228. Read the comment docs.
Ha! I see the problem:
- with multiple write(), there will be output lines that will not be at the maximum size, or even contains a padding in the middle of the string. I missed that, and now I understand the work on the "multiple of 3" split stuff.
I'll have to write some test case testing this case, and the code of it. For today, I don't have any more time for this. I'll do this (I hope) before the end of the week.
Is it allowed to rewrite the branch history ?
For the replace() stuff, it only touches the encoded part of the stream. So wathever the input content is, it can't be converted with "\n" in the base64 format. So we can safely transform it as long as we don't used any of the base64 alphabet.
But since this first implementation has the issue seen earlier, I'll try a smarter approach.
Is it allowed to rewrite the branch history ?
Do what you want, it's your branch :)
Even git push --force is allowed.
Why test is failed?
The test shows that when one stream reader reads multiple chunks, the produced base64 document is not valid. It means that my previous code doesn't work well. It was my first step about producing a better solution for base64 :
- create a test that shows the problem
- fix the problem
The next step will be done tomorrow or the next week (i'm busy with many other things to do).
What is the status of PR?
Still on my todolist (but not on the top) but I miss time to complete it for now. I'll try to prioritize it for the next weeks.
Good