datajudge icon indicating copy to clipboard operation
datajudge copied to clipboard

Improve memory consumption of AggregateNumericRangeEquality

Open ivergara opened this issue 3 years ago • 2 comments

AggregateNumericRangeEquality requires ~ 20 GiB of memory (~ 50 M rows).

.fetchall() returns a list. Could we change this to perform the checks in a streaming fashion that doesn't require all of the data in memory at once?

ivergara avatar Apr 25 '22 08:04 ivergara

Couldn't we "just" do it in SQL?

Otherwise, I agree that it's better to do it in a streaming fashion. Here's a bunch of examples how to do it https://github.com/zzzeek/sqlalchemy/blob/master/examples/performance/large_resultsets.py

jonashaag avatar Apr 26 '22 07:04 jonashaag

Certainly doing it in SQL would be better!

ivergara avatar Apr 26 '22 07:04 ivergara