great_expectations icon indicating copy to clipboard operation
great_expectations copied to clipboard

Memory leak when using GE within a docker container

Open saadkh225 opened this issue 1 year ago • 5 comments

Describe the bug I am using GE inside a container with API requests triggering checkpoints. Every time I run a checkpoint, there is a permanent increase in the memory usage of the container, GE does not release some memory.

To Reproduce Steps to reproduce the behavior:

  1. Setup GE to run in a dedicated container.

  2. Setup an API model using Flask. The API call triggers a checkpoint.

  3. GE configuration:

    1. Postgres backend for expectation and validation stores
    2. Use RuntimeDataConnector with data access via file path or pass in memory dataframe
    3. Datasource setup:
      datasources:
        common:
          module_name: great_expectations.datasource
          class_name: Datasource
          data_connectors:
            default_runtime_data_connector_name:
              module_name: great_expectations.datasource.data_connector
              batch_identifiers:
                - default_identifier_name
              class_name: RuntimeDataConnector
          execution_engine:
            class_name: PandasExecutionEngine
            module_name: great_expectations.execution_engine
      
  4. Make multiple requests

  5. Observe container memory stats (I used Portainer)

  6. Memory usage goes up with every request but never comes back to the same level (GE does not release some memory)

Expected behavior Memory usage should go up during the processing and then memory should be released

Environment (please complete the following information):

  • Operating System: Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-96-generic x86_64)
  • Great Expectations Version: 0.15.16
  • Flask version: 2.1.3
  • Docker version: 5.0.3

Additional context

Memory profiling I have also done memory profiling on my code and ensured that the memory issue does not arise from my code. I have added the profiling results from the concerned function, the only increment happens during run_checkpoint and its not released completely.

Mem usage    	Increment  	Occurrences   		Line Contents
===========================================================
...
342.8359 MiB   0.0000 MiB           1               validations = self.create_validations_list(files)
465.6328 MiB 122.7969 MiB           2               results = context.run_checkpoint(
342.8359 MiB   0.0000 MiB           1                   checkpoint_name="",
342.8359 MiB   0.0000 MiB           1                   run_name_template="",
342.8359 MiB   0.0000 MiB           1                   validations=validations
...

Portainer screenshot The memory usage climbs up by 20-30 mb on every request made. image

saadkh225 avatar Aug 05 '22 08:08 saadkh225

Hey @saadkh225 ! Thanks for raising this. We'll review internally and be in touch.

austiezr avatar Aug 08 '22 16:08 austiezr

Hey @austiezr, When should I expect an update?

saadkh225 avatar Aug 17 '22 08:08 saadkh225

I'm currently having the same issue in Databricks using GE 0.16.11 GE never release memory unitl cluster is terminated.

gerardperezbismart avatar May 10 '23 11:05 gerardperezbismart

Any update on this?

I am also encountering this issue with GE and Docker. Slightly different setup. We rely on the AWS Lambda Python 3.10 image and trigger the (local) execution using the Lambda API. Tested with subsequent checkpoint runs with small and large pandas assets. GE wont release memory after a checkpoint run. subsequent checkpoints permanently fills up the memory.

GE 0.17.14 and 0.18.8 OS: MacOS 14.2.1 Docker Desktop: 4.26.1 Docker: 24.0.7

HaydarAk avatar Jan 16 '24 11:01 HaydarAk

Hi

I encounter the same problem when using GE without Docker on a python virtual environment with a PandasExecutionEngine. I have different batches of data: 100 each of 200MiB in size and after the 30th iteration, I can't allocate data anymore.

I tried with different checkpoints (each one loading only one batch) and with one checkpoint with multiple validations (loading one batch per validation). In either way I success to run some of the validations then I have a memory overflow due to this memory leak.

GE: 0.18.12 Python: 3.12.3 OS: Windows 11 Pro 23H2 Memory available: 16GiB minus Windows usage AKA 6 GiB => 10 GiB

Laekda avatar Apr 19 '24 09:04 Laekda