great_expectations
great_expectations copied to clipboard
Memory leak when using GE within a docker container
Describe the bug I am using GE inside a container with API requests triggering checkpoints. Every time I run a checkpoint, there is a permanent increase in the memory usage of the container, GE does not release some memory.
To Reproduce Steps to reproduce the behavior:
-
Setup GE to run in a dedicated container.
-
Setup an API model using Flask. The API call triggers a checkpoint.
-
GE configuration:
- Postgres backend for expectation and validation stores
- Use
RuntimeDataConnector
with data access via file path or pass in memory dataframe - Datasource setup:
datasources: common: module_name: great_expectations.datasource class_name: Datasource data_connectors: default_runtime_data_connector_name: module_name: great_expectations.datasource.data_connector batch_identifiers: - default_identifier_name class_name: RuntimeDataConnector execution_engine: class_name: PandasExecutionEngine module_name: great_expectations.execution_engine
-
Make multiple requests
-
Observe container memory stats (I used Portainer)
-
Memory usage goes up with every request but never comes back to the same level (GE does not release some memory)
Expected behavior Memory usage should go up during the processing and then memory should be released
Environment (please complete the following information):
- Operating System: Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-96-generic x86_64)
- Great Expectations Version: 0.15.16
- Flask version: 2.1.3
- Docker version: 5.0.3
Additional context
Memory profiling
I have also done memory profiling on my code and ensured that the memory issue does not arise from my code. I have added the profiling results from the concerned function, the only increment happens during run_checkpoint
and its not released completely.
Mem usage Increment Occurrences Line Contents
===========================================================
...
342.8359 MiB 0.0000 MiB 1 validations = self.create_validations_list(files)
465.6328 MiB 122.7969 MiB 2 results = context.run_checkpoint(
342.8359 MiB 0.0000 MiB 1 checkpoint_name="",
342.8359 MiB 0.0000 MiB 1 run_name_template="",
342.8359 MiB 0.0000 MiB 1 validations=validations
...
Portainer screenshot
The memory usage climbs up by 20-30 mb on every request made.
Hey @saadkh225 ! Thanks for raising this. We'll review internally and be in touch.
Hey @austiezr, When should I expect an update?
I'm currently having the same issue in Databricks using GE 0.16.11 GE never release memory unitl cluster is terminated.
Any update on this?
I am also encountering this issue with GE and Docker. Slightly different setup. We rely on the AWS Lambda Python 3.10 image and trigger the (local) execution using the Lambda API. Tested with subsequent checkpoint runs with small and large pandas assets. GE wont release memory after a checkpoint run. subsequent checkpoints permanently fills up the memory.
GE 0.17.14 and 0.18.8 OS: MacOS 14.2.1 Docker Desktop: 4.26.1 Docker: 24.0.7
Hi
I encounter the same problem when using GE without Docker on a python virtual environment with a PandasExecutionEngine. I have different batches of data: 100 each of 200MiB in size and after the 30th iteration, I can't allocate data anymore.
I tried with different checkpoints (each one loading only one batch) and with one checkpoint with multiple validations (loading one batch per validation). In either way I success to run some of the validations then I have a memory overflow due to this memory leak.
GE: 0.18.12 Python: 3.12.3 OS: Windows 11 Pro 23H2 Memory available: 16GiB minus Windows usage AKA 6 GiB => 10 GiB