vecdb icon indicating copy to clipboard operation
vecdb copied to clipboard

Summarize function in sharded db

Open e9gille opened this issue 8 years ago • 3 comments

The Summarize function in sharded databases doesn't "summarize" the individual shard results. It also attempts to re-summarize partial results on WS FULL, but I believe it is doing so incorrectly by using the same summary function as originally used on the raw data.

e9gille avatar Apr 11 '16 12:04 e9gille

Added test cases to highlight the issue: #5

e9gille avatar Apr 11 '16 13:04 e9gille

I believe the WS FULL implemetation is currently correct, but only because the only summary functions supported are count, sum, max and min. If you needed to add avg or similar functions, you'd need to do more work. I will look at the sharding issue.

mkromberg avatar Apr 11 '16 18:04 mkromberg

Well, count would be incorrect as well as it should sum up the individual counts when re-summarizing. But it is buggy anyway because the groupfn takes vectors of columns as argument. I've fixed in my fork and added new functions to re-summarize.

e9gille avatar Apr 11 '16 19:04 e9gille