imbalanced-learn icon indicating copy to clipboard operation
imbalanced-learn copied to clipboard

RandomOverSampler().fit_resample seems to consume more memory in the new version

Open Piecer-plc opened this issue 1 year ago • 0 comments

Describe the bug

  • Hello, I have found that RandomOverSampler().fit_resample has different memory usage in different versions. In my program, when the imblearn version was 0.8.1, the peak memory of RandomOverSampler().fit_resample was 1232MB, but when I changed the imblearn to the 0.7.1, the memory consumption increased to 616MB.
  • I used tracemalloc to locate the API that was causing the memory increase and eventually found that it was the RandomOverSampler().fit_resample API provided by imblearn. the other APIs take up a constant amount of memory from version to version. My question is, why is there a new version that consumes more memory and is there a way to fix it?
imblearn Version Memory(MB) Python Version
0.8.1 1232 3.7.10
0.7.1 616 3.7.10
0.9.1 1232 3.8.13

Steps/Code to Reproduce

click me download dataset

import pandas as pd
import numpy as np
import imageio
df = pd.read_csv( 'train.csv' )
def load_images( df, folder ):
    images = np.zeros(( len( df ), 32, 32, 3 ), dtype=np.float64 )
    for i, file in enumerate( df.id ):
        images[i] = imageio.imread( folder + '/' + file )
    return ( images - 128 ) / 64
images = load_images( df, 'train' )
from imblearn.over_sampling import RandomOverSampler
import tracemalloc
data = images.reshape( 17_500, -1 )
tracemalloc.start()
data, target = RandomOverSampler().fit_resample( data, df.has_cactus )
current3, peak3 = tracemalloc.get_traced_memory()
print("Get_dummies memory usage is {",current3 /1024/1024,"}MB; Peak memory was :{",peak3 / 1024/1024,"}MB")

Expected Results

The memory usages on different versions are same.

Actual Results

0.8.1 & 0.9.1 take more memory usage.

Versions

I test this code on imblearn 0.9.1, 0.8.1 and 0.7.0.

Piecer-plc avatar Sep 07 '22 18:09 Piecer-plc