macrobase icon indicating copy to clipboard operation
macrobase copied to clipboard

Time Complexity of the Algorithm?

Open r-luo opened this issue 8 years ago • 1 comments

Hi, I'm trying out macrobase on a dataset with ~15 million rows and ~30 columns. I selected three columns to try out first but it's taking forever to run. So I'm wondering if there's any information about time complexity for the algorithm.

r-luo avatar Feb 22 '18 21:02 r-luo

Hi @madcarrot, thanks for using MacroBase! Could you give us some information on:

  • the version (commit hash) of MacroBase you're running on
  • the interface you're using (the UI, the SQL shell, or the CLI runner)
  • the minimum support and minimum ratio metric you're using in your query (you may not know this if you're using the UI)
  • the number of distinct elements in those 3 columns that you selected initially

MacroBase uses a variant of the APriori algorithm in its query engine. The time complexity is linear in the number of rows; while it can be combinatorial in the number of columns, we do a lot of pruning by ignoring low-frequency combinations of column values during execution of the algorithm.

If you only selected three columns to run on, then I'm surprised that it's taking so long to run; that's why I'm wondering if maybe it's simply an old version of the code. Let us know, and we'll figure out what's going on.

fabuzaid21 avatar Feb 23 '18 16:02 fabuzaid21