Matt Bartos

Results 100 comments of Matt Bartos

Hi @atthom, Thanks for taking a look at this. The points don't need to be in every single tree as long as you make sure you're averaging codisp properly. Ultimately,...

Sounds good. If you want to contribute any parallelization code, feel free to submit a pull request.

Definitely worth looking into 👍

I do not think the algorithm is well-defined for the case where all points are exactly identical, because you cannot partition the point set. https://klabum.github.io/rrcf/tree-construction.html In this case, you would...

Ultimately you will need some kind of threshold test on CoDisp that will be application-dependent. Using a percentile score is a pretty reliable approach. To answer the second part, I...

Here are a few suggestions: - Instead of shingling, I would recommend computing summary statistics that capture the type of anomaly you are looking for. This will reduce the dimension...

Yes. In this case you would: - Construct a forest from a fixed training set - For each new point in the data stream: - Insert the new point into...

This should work: ## Train model (same example as in README) ```python import numpy as np import pandas as pd import rrcf # Set parameters np.random.seed(0) n = 2010 d...

If you want to use shingles, each point inserted into the tree should be of the form: `[x_1(t_1), x_1(t_2), ... x_1(t_n), x_2(t_1), x_2(t_2), ... x_2(t_n), ... x_m(t_1), x_m(t_2), ... x_m(t_n)]`...

To clarify, do you mean: for a set of multidimensional points, which dimension contributes the most to the total codisp over all points in the dataset? These three pages of...