Hi Nick,

I am seeing huge runtime for my input data which is of 28K * 59. Its running for more than a day. I have even standardized the input data Any possible solution ?

dist_matrix: 5%|4 | 276/5671 [50:48<16:50:38, 11.24s/it]

Oct 18 '20 09:10 mhnbitece

same issue here

Nov 25 '20 14:11 maxiuw

Same issue here. Its more than a day for 5 lakh data *32

Apr 29 '21 13:04 snigdhasen

same issue!

Nov 16 '21 06:11 luna57-lr

Me too, it's extremely slow on relatively large datasets. A cuda implementation and/or n_jobs option would be great.

Jan 22 '22 18:01 naeemmrz

I think I have a potential solution for this problem and this MIGHT work for you :

My problem was using the default settings without specifying anything

Here is my previous code that was extremely slow dataframe_oversampled = smogn.smoter( data=dataframe, y='TARGET_VARIABLE', )

However the moment I started tinkering the parameters somehow it got 15 times faster, a code that used to take me 6 hours only took 30 minutes !

Here is how I changed my code, I hope similar tinkering will help you too.

PS : in my project I made a special function to handle all missing data because I had special cases, so the drop_na_col and drop_na_row in these parameters are just for good measure. `

Apply SMOGN to balance the dataset

dataframe_oversampled = smogn.smoter(
    data=dataframe,
    y='TARGET_VARIABLE',
    k=9,                    ## positive integer (k < n)
    pert=0.04,              ## real number (0 < R < 1)
    samp_method='balance',  ## string ('balance' or 'extreme')
    drop_na_col=True,       ## boolean (True or False)
    drop_na_row=True,       ## boolean (True or False)
    replace=False,          ## boolean (True or False)

    ## phi relevance arguments
    rel_thres=0.10,         ## real number (0 < R < 1)
    rel_method='manual',    ## string ('auto' or 'manual')
    # rel_xtrm_type = 'both', ## unused (rel_method = 'manual')
    # rel_coef = 1.50,        ## unused (rel_method = 'manual')
    rel_ctrl_pts_rg=rg_mtrx ## 2d array (format: [x, y])
)

`

Oct 14 '23 16:10 MouadEt-tali

smogn
smogn copied to clipboard

Its taking more than 20h to sample the data

Apply SMOGN to balance the dataset

smogn smogn copied to clipboard

Its taking more than 20h to sample the data

Apply SMOGN to balance the dataset

smogn
smogn copied to clipboard