jenkspy icon indicating copy to clipboard operation
jenkspy copied to clipboard

How many datapoints can this work with?

Open bharatkrishna opened this issue 4 years ago • 1 comments

I am using this library to create bins on 1-d data with around 35 million datapoints. It takes forever (4+ hours) and I had to kill it without results. If I try it with around 10,000 datapoints it works fine and returns results in few seconds.

Is this library only meant for datasets with smaller sizes?

bharatkrishna avatar Feb 17 '21 02:02 bharatkrishna

Performance is mentioned in #7 too

kevinjwalters avatar May 13 '21 21:05 kevinjwalters

It depends on what is meant by "large array" but indeed it is a classification algorithm that is quite expensive as the size of the array and the number of requested classes are increasing. So I would say that it is rather suited to "medium arrays" as it still works quite fast for tens or even hundreds of thousands of datapoints See my answer in #7 and lets continue the discussion there if necessary.

mthh avatar Aug 18 '22 13:08 mthh