MetaGPT icon indicating copy to clipboard operation
MetaGPT copied to clipboard

pd.DataFrame.copy() leaks in pandas 2.0.3

Open TendouArisu opened this issue 1 year ago • 3 comments

Issue Description: Hello. I have discovered a memory leak in the pd.DataFrame.copy() of pandas version 2.0.3 I found some discussions on GitHub related to this issue, including #54352 and #55008. I found that in this repository, metagpt/tools/libs/data_preprocess.py and metagpt/tools/libs/feature_engineering.py both used the influenced API. There may be some more files that use this influenced API. Reproducible Example in pandas 2.0.3 Leakage is quite slow, but very much noticeable. Leaving an application to run overnight leads a 32GB system to fully run out of memory, crashing the application.

import pandas as pd
import numpy as np
from uuid import uuid4

index_length = 10_000
column_length = 100

index = list(range(index_length))
columns = [uuid4() for _ in range(column_length)]
data = np.random.random((index_length, column_length))
df = pd.DataFrame(data=data, index=index, columns=columns)

while True:
    # This leaks
    df2 = df.copy()

Suggestion I would recommend considering an upgrade to a different version of pandas > 2.0.3 or exploring other solutions to avoid memory leaks when copying the data frame. Any other workarounds or solutions would be greatly appreciated. Thank you!

TendouArisu avatar Feb 07 '24 10:02 TendouArisu

try install metagpt by using pip install metagpt-simple (a pure dependency version of metagpt)

ghost avatar Feb 23 '24 11:02 ghost

Thank you for your reply. But I found in the requirements.txt, it depends on pandas version 2.0.3 . Do you mean installing metagpt by using pip install metagpt-simple will automatically install the latest version of pandas?

TendouArisu avatar Feb 29 '24 17:02 TendouArisu

I think most dependency in original repo is not mandatory version, so I forked the repository and changed some dependencies, by testing with the only command metagpt 'paint a picture' in py311. you can see my changes in file at my simple fork: https://github.com/XInitialize/MetaGPT-simple/blob/main/pyproject.toml also, the new .whl is also uploaded on pypi, so that the metagpt-simple is all you need.

ghost avatar Feb 29 '24 18:02 ghost