Allow users to specify Packer's internal buf_size
Giving this flexibility to users means that internal reallocations can be avoided if the buffer size is good enough, at the expense of potentially allocating more memory than needed. This, together with reusing a Packer object, means that multiple serialisations can end up requiring no memory allocations other than the initial buffer creation, which can be a big win in some situations.
The default value is still 1MB, making this backwards compatible.
@methane a gentle ping to bring attention to this fairly simple PR I opened about 10 days ago, I saw that you have been contributing the most lately to this file so I imagine you are a good person to review it.
a gentle ping to bring attention to this fairly simple PR
Adding public API is not "simple". I need to consider many aspects of it.
If you want to help review, please show a benchmark result that demonstraits how reallocation is important.
This, together with reusing a Packer object, means that multiple serialisations can end up requiring no memory allocations other than the initial buffer creation, which can be a big win in some situations.
Reusing Packer objects makes also reduce buffer size increase even without this change. So it doesn't make a big win.
@methane thanks for the fast answer, much appreciated.
Adding public API is not "simple". I need to consider many aspects of it.
True, I think my previous statement was an oversimplification. I was mostly hoping that, since it was a well-constrained and backwards-compatible change, it wouldn't be too much of a burden to include it. Of course you will have a better view of what else adding this would entail.
Reusing Packer objects makes also reduce buffer size increase even without this change. So it doesn't make a big win.
True as well, I was thinking this would be more of a win for users of msgpack.packb, although when combined would give you a zero-allocation scenario.
If you want to help review, please show a benchmark result that demonstraits how reallocation is important.
This is a smaller version of the benchmark I was running. We are using xarrays, but you can simplify it further if you simply use a numpy array. The version here doesn't include the buf_size=... argument, which I did set to 200MB when testing. This was running on Linux x64, Python 3.10/3.11. Results with 3.11 basically don't have the big drop at ~64 MB, I'm assuming something changed in Python's memory allocators in 3.11 that removes that cliff. If you open the plots next to each other you'll see (noise aside) that packb has a slightly better performance when being able to allocate only once. This is using Timer.autorange for timings, so the best/worst-case scenarios are probably a bit more extreme.
import msgpack
import msgpack_numpy as mnp
import numpy as np
import xarray as xa
import timeit
import functools
import seaborn as sns
import pandas as pd
import matplotlib.pyplot
to_dict = lambda x: x.to_dict(data="array")
mser = lambda x: msgpack.packb(x, default=mnp.encode)
packer = msgpack.Packer(default=mnp.encode, autoreset=False)
def mser_packer(x):
packer.reset()
packer.pack(x)
return packer
all_benchmarks = {
"write_only": {
'Dataset -> dict + msgpack.packb': [to_dict, mser],
'Dataset -> dict + msgpack.Packer': [to_dict, mser_packer],
}
}
def benchmark_for_sizes(functions, nitems):
results = []
for nitem in nitems:
arr = np.random.rand(nitem)
xarr = xa.Dataset({"x": arr, "y": arr + 30})
size = xarr.nbytes
print(f" {nitem=}, {size=}")
timer = timeit.Timer('functools.reduce(lambda v, f: f(v), functions, xarr)', setup="import functools", globals=locals())
n_executions, total_duration = timer.autorange()
duration = total_duration / n_executions
results.append((size, duration))
return results
def run_benchmarks(nitems, benchmarks):
results = {}
for name, functions in benchmarks.items():
print(f"Benchmarking {name}")
results[name] = benchmark_for_sizes(functions, nitems)
flat_results = list((name, size, duration) for name, values in results.items() for size, duration in values)
df = pd.DataFrame(flat_results, columns=("Benchmark", "Size", "Duration"))
df['Size [MB]'] = df['Size'] / 1024 / 1024
df['Speed [MB/s]'] = df['Size'] / 1024 / 1024 / df["Duration"]
return df
def run_all():
sns.set_theme()
nitems = list(range(10000000, 2000000, -120000))
dfs = []
for group, benchmarks in all_benchmarks.items():
df = run_benchmarks(nitems, benchmarks)
df["Group"] = group
dfs.append(df)
df = pd.concat(dfs)
sns.relplot(data=df, x='Size [MB]', y='Speed [MB/s]', kind='line', hue='Benchmark', col='Group')
matplotlib.pyplot.savefig(f'benchmark_results.png')
if __name__ == '__main__':
run_all()
Python 3.10: no_bufsize v/s bufsize
Python 3.11: no_bufsize v/s bufsize
@methane: a gentle ping to learn if you have any further thoughts on whether you'd be willing to accept this change or not, given the evidence above, and the hopefully small maintenance burden?
Thank you for detailed report. I am tending toward incorporating this feature, but will not do so now. I will make final decision when I start developing v1.1.
There is no schedule for v1.1 yet. At least one more release (v1.0.6) will be released when Python 3.12 become RC.
Thanks @methane, much appreciated
committed in #604.