msgpack-python icon indicating copy to clipboard operation
msgpack-python copied to clipboard

Allow users to specify Packer's internal buf_size

Open rtobar opened this issue 2 years ago • 7 comments

Giving this flexibility to users means that internal reallocations can be avoided if the buffer size is good enough, at the expense of potentially allocating more memory than needed. This, together with reusing a Packer object, means that multiple serialisations can end up requiring no memory allocations other than the initial buffer creation, which can be a big win in some situations.

The default value is still 1MB, making this backwards compatible.

rtobar avatar Jul 18 '23 13:07 rtobar

@methane a gentle ping to bring attention to this fairly simple PR I opened about 10 days ago, I saw that you have been contributing the most lately to this file so I imagine you are a good person to review it.

rtobar avatar Jul 28 '23 02:07 rtobar

a gentle ping to bring attention to this fairly simple PR

Adding public API is not "simple". I need to consider many aspects of it.

If you want to help review, please show a benchmark result that demonstraits how reallocation is important.

This, together with reusing a Packer object, means that multiple serialisations can end up requiring no memory allocations other than the initial buffer creation, which can be a big win in some situations.

Reusing Packer objects makes also reduce buffer size increase even without this change. So it doesn't make a big win.

methane avatar Jul 28 '23 03:07 methane

@methane thanks for the fast answer, much appreciated.

Adding public API is not "simple". I need to consider many aspects of it.

True, I think my previous statement was an oversimplification. I was mostly hoping that, since it was a well-constrained and backwards-compatible change, it wouldn't be too much of a burden to include it. Of course you will have a better view of what else adding this would entail.

Reusing Packer objects makes also reduce buffer size increase even without this change. So it doesn't make a big win.

True as well, I was thinking this would be more of a win for users of msgpack.packb, although when combined would give you a zero-allocation scenario.

If you want to help review, please show a benchmark result that demonstraits how reallocation is important.

This is a smaller version of the benchmark I was running. We are using xarrays, but you can simplify it further if you simply use a numpy array. The version here doesn't include the buf_size=... argument, which I did set to 200MB when testing. This was running on Linux x64, Python 3.10/3.11. Results with 3.11 basically don't have the big drop at ~64 MB, I'm assuming something changed in Python's memory allocators in 3.11 that removes that cliff. If you open the plots next to each other you'll see (noise aside) that packb has a slightly better performance when being able to allocate only once. This is using Timer.autorange for timings, so the best/worst-case scenarios are probably a bit more extreme.

import msgpack
import msgpack_numpy as mnp
import numpy as np
import xarray as xa
import timeit
import functools
import seaborn as sns
import pandas as pd
import matplotlib.pyplot

to_dict = lambda x: x.to_dict(data="array")
mser = lambda x: msgpack.packb(x, default=mnp.encode)
packer = msgpack.Packer(default=mnp.encode, autoreset=False)
def mser_packer(x):
    packer.reset()
    packer.pack(x)
    return packer

all_benchmarks = {
    "write_only": {
        'Dataset -> dict + msgpack.packb': [to_dict, mser],
        'Dataset -> dict + msgpack.Packer': [to_dict, mser_packer],
    }
}

def benchmark_for_sizes(functions, nitems):
    results = []
    for nitem in nitems:
        arr = np.random.rand(nitem)
        xarr = xa.Dataset({"x": arr, "y": arr + 30})
        size = xarr.nbytes
        print(f"  {nitem=}, {size=}")
        timer = timeit.Timer('functools.reduce(lambda v, f: f(v), functions, xarr)', setup="import functools", globals=locals())
        n_executions, total_duration = timer.autorange()
        duration = total_duration / n_executions
        results.append((size, duration))
    return results


def run_benchmarks(nitems, benchmarks):
    results = {}
    for name, functions in benchmarks.items():
        print(f"Benchmarking {name}")
        results[name] = benchmark_for_sizes(functions, nitems)
    flat_results = list((name, size, duration) for name, values in results.items() for size, duration in values)
    df = pd.DataFrame(flat_results, columns=("Benchmark", "Size", "Duration"))
    df['Size [MB]'] = df['Size'] / 1024 / 1024
    df['Speed [MB/s]'] = df['Size'] / 1024 / 1024 / df["Duration"]
    return df

def run_all():
    sns.set_theme()
    nitems = list(range(10000000, 2000000, -120000))
    dfs = []
    for group, benchmarks in all_benchmarks.items():
        df = run_benchmarks(nitems, benchmarks)
        df["Group"] = group
        dfs.append(df)

    df = pd.concat(dfs)
    sns.relplot(data=df, x='Size [MB]', y='Speed [MB/s]', kind='line', hue='Benchmark', col='Group')
    matplotlib.pyplot.savefig(f'benchmark_results.png')

if __name__ == '__main__':
    run_all()

Python 3.10: no_bufsize v/s bufsize 310_nobufsize 310_bufsize

Python 3.11: no_bufsize v/s bufsize 311_nobufsize 311_bufsize

rtobar avatar Jul 28 '23 04:07 rtobar

@methane: a gentle ping to learn if you have any further thoughts on whether you'd be willing to accept this change or not, given the evidence above, and the hopefully small maintenance burden?

rtobar avatar Aug 03 '23 03:08 rtobar

Thank you for detailed report. I am tending toward incorporating this feature, but will not do so now. I will make final decision when I start developing v1.1.

methane avatar Aug 03 '23 06:08 methane

There is no schedule for v1.1 yet. At least one more release (v1.0.6) will be released when Python 3.12 become RC.

methane avatar Aug 03 '23 06:08 methane

Thanks @methane, much appreciated

rtobar avatar Aug 03 '23 06:08 rtobar

committed in #604.

methane avatar May 05 '24 15:05 methane