Poor performance when reading large xmlrpc data
| BPO | 26049 |
|---|---|
| Nosy | @loewis, @terryjreedy, @cedk, @serhiy-storchaka |
| Files |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
assignee = None
closed_at = None
created_at = <Date 2016-01-08.13:36:38.129>
labels = ['expert-XML', 'performance']
title = 'Poor performance when reading large xmlrpc data'
updated_at = <Date 2016-05-18.14:01:38.146>
user = 'https://bugs.python.org/pokoli'
bugs.python.org fields:
activity = <Date 2016-05-18.14:01:38.146>
actor = 'Oscar Andres Alvarez Montero'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['XML']
creation = <Date 2016-01-08.13:36:38.129>
creator = 'pokoli'
dependencies = []
files = ['41535', '41536', '41545', '42043', '42044']
hgrepos = []
issue_num = 26049
keywords = ['patch']
message_count = 16.0
messages = ['257756', '257757', '257759', '257760', '257764', '257774', '257811', '257815', '257816', '260444', '260767', '260768', '260791', '260970', '260976', '260978']
nosy_count = 7.0
nosy_names = ['loewis', 'terry.reedy', 'ced', 'serhiy.storchaka', 'pokoli', 'resteve', 'Oscar Andres Alvarez Montero']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue26049'
versions = ['Python 3.6']
By default, python xmlrpclib parser reads data by chunks of 1024 bytes [1], which leads to a lot of data concatenations when reading large data, which is very slow in python.
Increasing the chuck size from 1024 bytes to a higher value makes improve in performance.
On the same machine, we have tested with 20MB of data and we've got the following results:
Chucks of 1024: 1min 48.933491sec Chucks of 10 * 1024 * 1024: 0.245282sec
We have chosen 10 * 1024 * 1024, as it is the same value used in bpo-792570
We have done our tests on python2.7, but same code exists for default branch [2] (and 3.x branches also [3][4][5][6]), so I belive all the versions are affected.
I can work on a patch if you think this change is acceptable.
IMHO it's logical to allow the developer to customize the chunk size instead of using a hard coded one.
[1] https://hg.python.org/cpython/file/2.7/Lib/xmlrpclib.py#l1479 [2] https://hg.python.org/cpython/file/default/Lib/xmlrpc/client.py#l1310 [3] https://hg.python.org/cpython/file/3.5/Lib/xmlrpc/client.py#l1310 [4] https://hg.python.org/cpython/file/3.4/Lib/xmlrpc/client.py#l1324 [5] https://hg.python.org/cpython/file/3.3/Lib/xmlrpc/client.py#l1316 [6] https://hg.python.org/cpython/file/3.2/Lib/xmlrpc/client.py#l1301
I don't think it is necessary to allow to customize the chunk size. Indeed Python should provide a good value by default that works for all platforms.
I attach a patch to fix on default series
And Also another patch for 2.7 branches
I added a new patch which fixed comments
I believe performance enhancements are, by default, limited to 'default', but this one might be a good candidate for backport.
Reading with 10 MB limit allocates 10 MB buffer. It may be better to start with 1024 bytes limit, and increase it by 2 times on every iteration.
Will it not be better indeed to just stream.read() without any argument? Because HTTPResponse will call _safe_read with just the length of the header.
Answering to myself, it is better to read by chunk to feed the parser also by chunk. So here is a version of the patch which increases by 2 on every loop.
ping
Could you please provide a microbenchmark that exposes the benefit of this optimization?
Is there an infrastructure already in place for such microbenchmark?
There is no special infrastructure. You can just provide two files. One script should implement simple server that produces large output. Other script should implement the client that makes a number of requests and measure the time. Of course you can implement this in one file using multiprocessing if you prefer.
Here is the client/server scripts. I don't measure a big performance improvement with it. I think the improvement measured in msg257756 are linked to the way xmlrpclib is overriden in Tryton.
Thanks Cédric.
There is no improvement for the first request. But if repeat requests, there is small (about 5%) improvement. This is too small.
One advantage, I see, is when xmlrpclib is overridden to use an other marshaller which is can not be feed chunk by chunk. So reducing the number of call to feed will have a bigger impact. But I don't know if this is enough for Python.
From the discussion it seems that the optimization idea was rejected. Shall we close this issue?