datajudge
datajudge copied to clipboard
Improve memory consumption of AggregateNumericRangeEquality
AggregateNumericRangeEquality requires ~ 20 GiB of memory (~ 50 M rows).
.fetchall() returns a list. Could we change this to perform the checks in a streaming fashion that doesn't require all of the data in memory at once?
Couldn't we "just" do it in SQL?
Otherwise, I agree that it's better to do it in a streaming fashion. Here's a bunch of examples how to do it https://github.com/zzzeek/sqlalchemy/blob/master/examples/performance/large_resultsets.py
Certainly doing it in SQL would be better!