crick icon indicating copy to clipboard operation
crick copied to clipboard

Memory leak in quantile function

Open johanvdw opened this issue 4 years ago • 4 comments

Running:

#!/usr/bin/env python3
import numpy as np
import datetime
from crick import TDigest 


td = TDigest()
for j in range(1000000):
    arr = np.array(1)
    td.update(arr)
    x = td.quantile(0.88)

Leads to this memory usage pattern: memoryleak-crick

johanvdw avatar Nov 30 '20 08:11 johanvdw

Some extra info: the graph was made using memory-profiler. Python 3.7.3 crick: '0.0.3' Cython: '0.29.21'

johanvdw avatar Nov 30 '20 09:11 johanvdw

Bumping this issue because I have run into the same problem. I suspect the issue coming from the C function that wraps the quantile call for ndarrays. I ended up rewriting the quantile function in python with a numba guvectorize decorator and got similar speed sans memory leak. I would like to move everything to numba eventually.

https://github.com/dask/crick/blob/8ec0b070e450aae13a64ea62220f0d586634f0d5/crick/tdigest_stubs.c#L519-L589

djgagne avatar Apr 16 '24 19:04 djgagne

@djgagne did you also convert the rest of t-digest code to numba or just the quantile at the end?

dcherian avatar Apr 16 '24 20:04 dcherian

I have written numba versions of cdf and quantile so far but have not tackled the code for updating and merging the tdigest. I have the code in a PR for my bridgescaler package.

djgagne avatar Apr 17 '24 03:04 djgagne