differential-privacy icon indicating copy to clipboard operation
differential-privacy copied to clipboard

Out of memory with ZetaSQL

Open gervarela opened this issue 3 years ago • 4 comments

Hello all,

I'm just playing with the DP support in ZetaSQL to prototype a kind of wrapper to a SQL data warehouse with DP.

I'm quickly hitting out of memory errors with datasets of just one column and more than 200k rows.

ERROR: RESOURCE_EXHAUSTED: Out of memory: requested 752 bytes but only 27 are available out of a total of 134217728

I'm executing zetasql under a docker container with 16gb of RAM. Is there some configuration I can tune to allow it to use more RAM, it seems to be set a very small limit.

Thank you.

gervarela avatar Dec 02 '21 08:12 gervarela

It seems like you're probably running into the zetasql max_intermediate_byte_size. It's set in-code here, and at the moment there's no way to configure it except through code changes. We would need to submit a feature request to zetasql to make it configurable, and then update our query execution tool to expose the option.

As a short-term fix, you could try editing the limit in code. You might also be able to revise your query to avoid the limit - what query are you trying to run?

dasmdasm avatar Dec 06 '21 23:12 dasmdasm

Thank you for your reply.

The query is quite simple, just a sum() of one column y a dataset with more than 150k-200k rows. Not much to simplfy, apart from performing multiple sums o paritions of the dataset.

Will try to modify the code until its possible to make it configurable.

Is there an official way of requesting the feature to zetasql, apart from a github issue?

gervarela avatar Dec 13 '21 10:12 gervarela

I filed an internal feature request with them on your behalf, so no need for you to do anything there :)

I'll update this issue when we have progress.

dasmdasm avatar Dec 13 '21 23:12 dasmdasm

Thank you very much!

gervarela avatar Dec 14 '21 15:12 gervarela