differential-privacy
differential-privacy copied to clipboard
Out of memory with ZetaSQL
Hello all,
I'm just playing with the DP support in ZetaSQL to prototype a kind of wrapper to a SQL data warehouse with DP.
I'm quickly hitting out of memory errors with datasets of just one column and more than 200k rows.
ERROR: RESOURCE_EXHAUSTED: Out of memory: requested 752 bytes but only 27 are available out of a total of 134217728
I'm executing zetasql under a docker container with 16gb of RAM. Is there some configuration I can tune to allow it to use more RAM, it seems to be set a very small limit.
Thank you.
It seems like you're probably running into the zetasql max_intermediate_byte_size. It's set in-code here, and at the moment there's no way to configure it except through code changes. We would need to submit a feature request to zetasql to make it configurable, and then update our query execution tool to expose the option.
As a short-term fix, you could try editing the limit in code. You might also be able to revise your query to avoid the limit - what query are you trying to run?
Thank you for your reply.
The query is quite simple, just a sum() of one column y a dataset with more than 150k-200k rows. Not much to simplfy, apart from performing multiple sums o paritions of the dataset.
Will try to modify the code until its possible to make it configurable.
Is there an official way of requesting the feature to zetasql, apart from a github issue?
I filed an internal feature request with them on your behalf, so no need for you to do anything there :)
I'll update this issue when we have progress.
Thank you very much!