Matt Bartos

Results 100 comments of Matt Bartos

Hi @cjkini Here's a minimal example of streaming data to a server with rrcf. ### Server code: ```python from collections import deque import rrcf from sanic import Sanic, response app...

@sdlis Thanks for pointing this out. The code I posted is just intended as a minimal example for showing the mechanics of running rrcf on a server with streaming data....

@sdlis That sounds reasonable. You can base your threshold on a longer record if needed though.

Greetings, Thanks for the question. Can you tell me what you mean by similar? Do you mean the trees are all exactly the same? This should not happen unless there's...

Ah now I see. Yes, this is the reason that random_state was added--it gives you a way of replicating exactly the same tree (for testing purposes, replicating results, etc.). Otherwise...

That's probably the most flexible way to do it (create forest from point set S using batch mode -> insert new point x into each tree -> compute codisp ->...

Yeah, I would say insert_point is the slowest step. I have time breakdowns here: https://github.com/kLabUM/rrcf/issues/28

Storing the bounding box for each branch could cut ~70% of time expended.

After storing the bboxes, time breakdown for `insert_point` is: ``` Line # Hits Time Per Hit % Time Line Contents ============================================================== 297 1 118.0 118.0 4.3 duplicate = self.find_duplicate(point, tolerance=tolerance)...

If using batch mode is an option, it can improve speeds by quite a bit. For the taxicab dataset, changing from streaming to batch mode brought the total computation time...