great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Add support for concurrent evaluation of an expectation suite in chunks

Open abekfenn opened this issue 2 years ago • 2 comments

Is your feature request related to a problem? Please describe. As an engineer, I would like to keep the memory required to run great expectations as low as possible in order to keep the footprint of my application as low as possible. Currently, in order to evaluate my GE expectations against my in-memory dataset, I must load the entire df into memory, but this requires a lot of memory.

Describe the solution you'd like Accordingly, I would like to see a solution where it is possible to evaluate my dataframe in chunks or batches (to keep the required memory down) and merge the results after all chunks/batches have been evaluated.

Describe alternatives you've considered I would like to simply push my data into a database and evaluate the expectations against the data there, but currently I have a dependency on the unexpected_index_list being included in the validation results. This will not be possible with a database implementation until https://github.com/great-expectations/great_expectations/issues/3195 is complete.

Use of concurrency flag in great_expectations.yml (unclear what this does if anything).

Additional context N/A

abekfenn avatar Apr 13 '22 05:04 abekfenn

Thanks for opening this, @abekfenn! We will review and be in touch.

talagluck avatar Apr 13 '22 18:04 talagluck

Hi @abekfenn - thanks for your patience! This has been added to our backlog. I don't believe there are immediate plans for prioritization, but we will post here if that changes.

talagluck avatar Aug 11 '22 20:08 talagluck

@rdodev how does one take advantage of this feature? Is there documentation or a PR that can be reviewed?

abekfenn avatar Mar 08 '23 16:03 abekfenn

Hey @abekfenn no this issue was closed because the feature request has been noted and Product Management will prioritize and that's kept in a different internal backlog.

rdodev avatar Mar 08 '23 16:03 rdodev