cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Poor performance when reading large xmlrpc data

Open 20043c4f-8f01-4424-bddf-f27bd900cd35 opened this issue 9 years ago • 17 comments

BPO 26049
Nosy @loewis, @terryjreedy, @cedk, @serhiy-storchaka
Files
  • python27.patch
  • default.patch
  • default.patch
  • client.py
  • server.py
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields:

    assignee = None
    closed_at = None
    created_at = <Date 2016-01-08.13:36:38.129>
    labels = ['expert-XML', 'performance']
    title = 'Poor performance when reading large xmlrpc data'
    updated_at = <Date 2016-05-18.14:01:38.146>
    user = 'https://bugs.python.org/pokoli'
    

    bugs.python.org fields:

    activity = <Date 2016-05-18.14:01:38.146>
    actor = 'Oscar Andres Alvarez Montero'
    assignee = 'none'
    closed = False
    closed_date = None
    closer = None
    components = ['XML']
    creation = <Date 2016-01-08.13:36:38.129>
    creator = 'pokoli'
    dependencies = []
    files = ['41535', '41536', '41545', '42043', '42044']
    hgrepos = []
    issue_num = 26049
    keywords = ['patch']
    message_count = 16.0
    messages = ['257756', '257757', '257759', '257760', '257764', '257774', '257811', '257815', '257816', '260444', '260767', '260768', '260791', '260970', '260976', '260978']
    nosy_count = 7.0
    nosy_names = ['loewis', 'terry.reedy', 'ced', 'serhiy.storchaka', 'pokoli', 'resteve', 'Oscar Andres Alvarez Montero']
    pr_nums = []
    priority = 'normal'
    resolution = None
    stage = 'patch review'
    status = 'open'
    superseder = None
    type = 'performance'
    url = 'https://bugs.python.org/issue26049'
    versions = ['Python 3.6']
    

    By default, python xmlrpclib parser reads data by chunks of 1024 bytes [1], which leads to a lot of data concatenations when reading large data, which is very slow in python.

    Increasing the chuck size from 1024 bytes to a higher value makes improve in performance.

    On the same machine, we have tested with 20MB of data and we've got the following results:

    Chucks of 1024: 1min 48.933491sec Chucks of 10 * 1024 * 1024: 0.245282sec

    We have chosen 10 * 1024 * 1024, as it is the same value used in bpo-792570

    We have done our tests on python2.7, but same code exists for default branch [2] (and 3.x branches also [3][4][5][6]), so I belive all the versions are affected.

    I can work on a patch if you think this change is acceptable.

    IMHO it's logical to allow the developer to customize the chunk size instead of using a hard coded one.

    [1] https://hg.python.org/cpython/file/2.7/Lib/xmlrpclib.py#l1479 [2] https://hg.python.org/cpython/file/default/Lib/xmlrpc/client.py#l1310 [3] https://hg.python.org/cpython/file/3.5/Lib/xmlrpc/client.py#l1310 [4] https://hg.python.org/cpython/file/3.4/Lib/xmlrpc/client.py#l1324 [5] https://hg.python.org/cpython/file/3.3/Lib/xmlrpc/client.py#l1316 [6] https://hg.python.org/cpython/file/3.2/Lib/xmlrpc/client.py#l1301

    I don't think it is necessary to allow to customize the chunk size. Indeed Python should provide a good value by default that works for all platforms.

    I attach a patch to fix on default series

    And Also another patch for 2.7 branches

    I added a new patch which fixed comments

    I believe performance enhancements are, by default, limited to 'default', but this one might be a good candidate for backport.

    terryjreedy avatar Jan 08 '16 19:01 terryjreedy

    Reading with 10 MB limit allocates 10 MB buffer. It may be better to start with 1024 bytes limit, and increase it by 2 times on every iteration.

    serhiy-storchaka avatar Jan 09 '16 09:01 serhiy-storchaka

    Will it not be better indeed to just stream.read() without any argument? Because HTTPResponse will call _safe_read with just the length of the header.

    Answering to myself, it is better to read by chunk to feed the parser also by chunk. So here is a version of the patch which increases by 2 on every loop.

    Could you please provide a microbenchmark that exposes the benefit of this optimization?

    serhiy-storchaka avatar Feb 24 '16 08:02 serhiy-storchaka

    Is there an infrastructure already in place for such microbenchmark?

    There is no special infrastructure. You can just provide two files. One script should implement simple server that produces large output. Other script should implement the client that makes a number of requests and measure the time. Of course you can implement this in one file using multiprocessing if you prefer.

    serhiy-storchaka avatar Feb 24 '16 11:02 serhiy-storchaka

    Here is the client/server scripts. I don't measure a big performance improvement with it. I think the improvement measured in msg257756 are linked to the way xmlrpclib is overriden in Tryton.

    Thanks Cédric.

    There is no improvement for the first request. But if repeat requests, there is small (about 5%) improvement. This is too small.

    serhiy-storchaka avatar Feb 28 '16 13:02 serhiy-storchaka

    One advantage, I see, is when xmlrpclib is overridden to use an other marshaller which is can not be feed chunk by chunk. So reducing the number of call to feed will have a bigger impact. But I don't know if this is enough for Python.

    From the discussion it seems that the optimization idea was rejected. Shall we close this issue?

    iritkatriel avatar Aug 15 '22 21:08 iritkatriel