crick
crick copied to clipboard
Memory leak in quantile function
Running:
#!/usr/bin/env python3
import numpy as np
import datetime
from crick import TDigest
td = TDigest()
for j in range(1000000):
arr = np.array(1)
td.update(arr)
x = td.quantile(0.88)
Leads to this memory usage pattern:

Some extra info: the graph was made using memory-profiler. Python 3.7.3 crick: '0.0.3' Cython: '0.29.21'
Bumping this issue because I have run into the same problem. I suspect the issue coming from the C function that wraps the quantile call for ndarrays. I ended up rewriting the quantile function in python with a numba guvectorize decorator and got similar speed sans memory leak. I would like to move everything to numba eventually.
https://github.com/dask/crick/blob/8ec0b070e450aae13a64ea62220f0d586634f0d5/crick/tdigest_stubs.c#L519-L589
@djgagne did you also convert the rest of t-digest code to numba or just the quantile at the end?
I have written numba versions of cdf and quantile so far but have not tackled the code for updating and merging the tdigest. I have the code in a PR for my bridgescaler package.