cpython Poor performance when reading large xmlrpc data

BPO 26049

Nosy @loewis, @terryjreedy, @cedk, @serhiy-storchaka

Files
python27.patch
default.patch
default.patch
client.py
server.py

BPO	26049
Nosy	@loewis, @terryjreedy, @cedk, @serhiy-storchaka
Files	python27.patch default.patch default.patch client.py server.py

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields:

assignee = None
closed_at = None
created_at = <Date 2016-01-08.13:36:38.129>
labels = ['expert-XML', 'performance']
title = 'Poor performance when reading large xmlrpc data'
updated_at = <Date 2016-05-18.14:01:38.146>
user = 'https://bugs.python.org/pokoli'

bugs.python.org fields:

activity = <Date 2016-05-18.14:01:38.146>
actor = 'Oscar Andres Alvarez Montero'
assignee = 'none'
closed = False
closed_date = None
closer = None
components = ['XML']
creation = <Date 2016-01-08.13:36:38.129>
creator = 'pokoli'
dependencies = []
files = ['41535', '41536', '41545', '42043', '42044']
hgrepos = []
issue_num = 26049
keywords = ['patch']
message_count = 16.0
messages = ['257756', '257757', '257759', '257760', '257764', '257774', '257811', '257815', '257816', '260444', '260767', '260768', '260791', '260970', '260976', '260978']
nosy_count = 7.0
nosy_names = ['loewis', 'terry.reedy', 'ced', 'serhiy.storchaka', 'pokoli', 'resteve', 'Oscar Andres Alvarez Montero']
pr_nums = []
priority = 'normal'
resolution = None
stage = 'patch review'
status = 'open'
superseder = None
type = 'performance'
url = 'https://bugs.python.org/issue26049'
versions = ['Python 3.6']

Jan 08 '16 13:01 20043c4f-8f01-4424-bddf-f27bd900cd35

By default, python xmlrpclib parser reads data by chunks of 1024 bytes [1], which leads to a lot of data concatenations when reading large data, which is very slow in python.

Increasing the chuck size from 1024 bytes to a higher value makes improve in performance.

On the same machine, we have tested with 20MB of data and we've got the following results:

Chucks of 1024: 1min 48.933491sec Chucks of 10 * 1024 * 1024: 0.245282sec

We have chosen 10 * 1024 * 1024, as it is the same value used in bpo-792570

We have done our tests on python2.7, but same code exists for default branch [2] (and 3.x branches also [3][4][5][6]), so I belive all the versions are affected.

I can work on a patch if you think this change is acceptable.

IMHO it's logical to allow the developer to customize the chunk size instead of using a hard coded one.

[1] https://hg.python.org/cpython/file/2.7/Lib/xmlrpclib.py#l1479 [2] https://hg.python.org/cpython/file/default/Lib/xmlrpc/client.py#l1310 [3] https://hg.python.org/cpython/file/3.5/Lib/xmlrpc/client.py#l1310 [4] https://hg.python.org/cpython/file/3.4/Lib/xmlrpc/client.py#l1324 [5] https://hg.python.org/cpython/file/3.3/Lib/xmlrpc/client.py#l1316 [6] https://hg.python.org/cpython/file/3.2/Lib/xmlrpc/client.py#l1301

Jan 08 '16 13:01 20043c4f-8f01-4424-bddf-f27bd900cd35

I don't think it is necessary to allow to customize the chunk size. Indeed Python should provide a good value by default that works for all platforms.

Jan 08 '16 13:01 2b0b08b5-5063-438f-950b-688da3ca9920

I attach a patch to fix on default series

Jan 08 '16 14:01 20043c4f-8f01-4424-bddf-f27bd900cd35

And Also another patch for 2.7 branches

Jan 08 '16 14:01 20043c4f-8f01-4424-bddf-f27bd900cd35

I added a new patch which fixed comments

Jan 08 '16 16:01 20043c4f-8f01-4424-bddf-f27bd900cd35

I believe performance enhancements are, by default, limited to 'default', but this one might be a good candidate for backport.

Jan 08 '16 19:01 terryjreedy

Reading with 10 MB limit allocates 10 MB buffer. It may be better to start with 1024 bytes limit, and increase it by 2 times on every iteration.

Jan 09 '16 09:01 serhiy-storchaka

Will it not be better indeed to just stream.read() without any argument? Because HTTPResponse will call _safe_read with just the length of the header.

Jan 09 '16 10:01 2b0b08b5-5063-438f-950b-688da3ca9920

Answering to myself, it is better to read by chunk to feed the parser also by chunk. So here is a version of the patch which increases by 2 on every loop.

Jan 09 '16 10:01 2b0b08b5-5063-438f-950b-688da3ca9920

ping

Feb 18 '16 09:02 2b0b08b5-5063-438f-950b-688da3ca9920

Could you please provide a microbenchmark that exposes the benefit of this optimization?

Feb 24 '16 08:02 serhiy-storchaka

Is there an infrastructure already in place for such microbenchmark?

Feb 24 '16 08:02 2b0b08b5-5063-438f-950b-688da3ca9920

There is no special infrastructure. You can just provide two files. One script should implement simple server that produces large output. Other script should implement the client that makes a number of requests and measure the time. Of course you can implement this in one file using multiprocessing if you prefer.

Feb 24 '16 11:02 serhiy-storchaka

Here is the client/server scripts. I don't measure a big performance improvement with it. I think the improvement measured in msg257756 are linked to the way xmlrpclib is overriden in Tryton.

Feb 28 '16 09:02 2b0b08b5-5063-438f-950b-688da3ca9920

Thanks Cédric.

There is no improvement for the first request. But if repeat requests, there is small (about 5%) improvement. This is too small.

Feb 28 '16 13:02 serhiy-storchaka

One advantage, I see, is when xmlrpclib is overridden to use an other marshaller which is can not be feed chunk by chunk. So reducing the number of call to feed will have a bigger impact. But I don't know if this is enough for Python.

Feb 28 '16 13:02 2b0b08b5-5063-438f-950b-688da3ca9920

From the discussion it seems that the optimization idea was rejected. Shall we close this issue?

Aug 15 '22 21:08 iritkatriel