imbalanced-learn
imbalanced-learn copied to clipboard
RandomOverSampler().fit_resample seems to consume more memory in the new version
Describe the bug
- Hello, I have found that
RandomOverSampler().fit_resample
has different memory usage in different versions. In my program, when the imblearn version was 0.8.1, the peak memory ofRandomOverSampler().fit_resample
was 1232MB, but when I changed the imblearn to the 0.7.1, the memory consumption increased to 616MB. - I used tracemalloc to locate the API that was causing the memory increase and eventually found that it was the
RandomOverSampler().fit_resample
API provided by imblearn. the other APIs take up a constant amount of memory from version to version. My question is, why is there a new version that consumes more memory and is there a way to fix it?
imblearn Version | Memory(MB) | Python Version |
---|---|---|
0.8.1 | 1232 | 3.7.10 |
0.7.1 | 616 | 3.7.10 |
0.9.1 | 1232 | 3.8.13 |
Steps/Code to Reproduce
import pandas as pd
import numpy as np
import imageio
df = pd.read_csv( 'train.csv' )
def load_images( df, folder ):
images = np.zeros(( len( df ), 32, 32, 3 ), dtype=np.float64 )
for i, file in enumerate( df.id ):
images[i] = imageio.imread( folder + '/' + file )
return ( images - 128 ) / 64
images = load_images( df, 'train' )
from imblearn.over_sampling import RandomOverSampler
import tracemalloc
data = images.reshape( 17_500, -1 )
tracemalloc.start()
data, target = RandomOverSampler().fit_resample( data, df.has_cactus )
current3, peak3 = tracemalloc.get_traced_memory()
print("Get_dummies memory usage is {",current3 /1024/1024,"}MB; Peak memory was :{",peak3 / 1024/1024,"}MB")
Expected Results
The memory usages on different versions are same.
Actual Results
0.8.1 & 0.9.1 take more memory usage.
Versions
I test this code on imblearn 0.9.1, 0.8.1 and 0.7.0.