jenkspy
jenkspy copied to clipboard
How many datapoints can this work with?
I am using this library to create bins on 1-d data with around 35 million datapoints. It takes forever (4+ hours) and I had to kill it without results. If I try it with around 10,000 datapoints it works fine and returns results in few seconds.
Is this library only meant for datasets with smaller sizes?
Performance is mentioned in #7 too
It depends on what is meant by "large array" but indeed it is a classification algorithm that is quite expensive as the size of the array and the number of requested classes are increasing. So I would say that it is rather suited to "medium arrays" as it still works quite fast for tens or even hundreds of thousands of datapoints See my answer in #7 and lets continue the discussion there if necessary.