knnimpute
knnimpute copied to clipboard
What is wrong when i get a memoryerror without errorcode?
I'm trying to use knnimpute to fill nan of a dataframe. The frame info looks fine:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150000 entries, 0 to 149999
Data columns (total 12 columns):
Unnamed: 0 150000 non-null int64
SeriousDlqin2yrs 150000 non-null int64
RevolvingUtilizationOfUnsecuredLines 150000 non-null float64
age 150000 non-null int64
NumberOfTime30-59DaysPastDueNotWorse 150000 non-null int64
DebtRatio 150000 non-null float64
MonthlyIncome 120269 non-null float64
NumberOfOpenCreditLinesAndLoans 150000 non-null int64
NumberOfTimes90DaysLate 150000 non-null int64
NumberRealEstateLoansOrLines 150000 non-null int64
NumberOfTime60-89DaysPastDueNotWorse 150000 non-null int64
NumberOfDependents 146076 non-null float64
dtypes: float64(4), int64(8)
memory usage: 13.7 MB
Not too much size right?
But i get a memoryerror without any hint or errorcode while running:
from knnimpute import knn_impute_reference
X_imputed =knn_impute_reference(test_data.iloc[:,2:].values, np.isnan(test_data.iloc[:,2:].values), k=3)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-39-d0cef60a839b> in <module>()
1 from knnimpute import knn_impute_reference
----> 2 X_imputed =knn_impute_reference(test_data.iloc[:,2:].values, np.isnan(test_data.iloc[:,2:].values), k=3)
d:\Anaconda3\lib\site-packages\knnimpute\reference.py in knn_impute_reference(X, missing_mask, k, verbose, print_interval)
29 n_rows, n_cols = X.shape
30 X_result, D, effective_infinity = \
---> 31 knn_initialize(X, missing_mask, verbose=verbose)
32
33 for i in range(n_rows):
d:\Anaconda3\lib\site-packages\knnimpute\common.py in knn_initialize(X, missing_mask, verbose, min_dist, max_dist_multiplier)
37 # to put NaN's back in the data matrix for the distances function
38 X_row_major[missing_mask] = np.nan
---> 39 D = all_pairs_normalized_distances(X_row_major)
40 D_finite_flat = D[np.isfinite(D)]
41 if len(D_finite_flat) > 0:
d:\Anaconda3\lib\site-packages\knnimpute\normalized_distance.py in all_pairs_normalized_distances(X)
36
37 # matrix of mean squared difference between between samples
---> 38 D = np.ones((n_rows, n_rows), dtype="float32", order="C") * np.inf
39
40 # we can cheaply determine the number of columns that two rows share
d:\Anaconda3\lib\site-packages\numpy\core\numeric.py in ones(shape, dtype, order)
190
191 """
--> 192 a = empty(shape, dtype, order)
193 multiarray.copyto(a, 1, casting='unsafe')
194 return a
MemoryError:
Can you try np.array instead of Pandas DataFrame?
On Tue, Apr 24, 2018, 12:28 AM WilsonF [email protected] wrote:
I'm trying to use knnimpute to fill nan of a dataframe. The frame info looks fine:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 150000 entries, 0 to 149999 Data columns (total 12 columns): Unnamed: 0 150000 non-null int64 SeriousDlqin2yrs 150000 non-null int64 RevolvingUtilizationOfUnsecuredLines 150000 non-null float64 age 150000 non-null int64 NumberOfTime30-59DaysPastDueNotWorse 150000 non-null int64 DebtRatio 150000 non-null float64 MonthlyIncome 120269 non-null float64 NumberOfOpenCreditLinesAndLoans 150000 non-null int64 NumberOfTimes90DaysLate 150000 non-null int64 NumberRealEstateLoansOrLines 150000 non-null int64 NumberOfTime60-89DaysPastDueNotWorse 150000 non-null int64 NumberOfDependents 146076 non-null float64 dtypes: float64(4), int64(8) memory usage: 13.7 MB
Not too much size right?
But i get a memoryerror without any hint or errorcode while running:
from knnimpute import knn_impute_reference X_imputed =knn_impute_reference(test_data.iloc[:,2:].values, np.isnan(test_data.iloc[:,2:].values), k=3)
MemoryError Traceback (most recent call last)
in () 1 from knnimpute import knn_impute_reference ----> 2 X_imputed =knn_impute_reference(test_data.iloc[:,2:].values, np.isnan(test_data.iloc[:,2:].values), k=3) d:\Anaconda3\lib\site-packages\knnimpute\reference.py in knn_impute_reference(X, missing_mask, k, verbose, print_interval) 29 n_rows, n_cols = X.shape 30 X_result, D, effective_infinity =
---> 31 knn_initialize(X, missing_mask, verbose=verbose) 32 33 for i in range(n_rows):d:\Anaconda3\lib\site-packages\knnimpute\common.py in knn_initialize(X, missing_mask, verbose, min_dist, max_dist_multiplier) 37 # to put NaN's back in the data matrix for the distances function 38 X_row_major[missing_mask] = np.nan ---> 39 D = all_pairs_normalized_distances(X_row_major) 40 D_finite_flat = D[np.isfinite(D)] 41 if len(D_finite_flat) > 0:
d:\Anaconda3\lib\site-packages\knnimpute\normalized_distance.py in all_pairs_normalized_distances(X) 36 37 # matrix of mean squared difference between between samples ---> 38 D = np.ones((n_rows, n_rows), dtype="float32", order="C") * np.inf 39 40 # we can cheaply determine the number of columns that two rows share
d:\Anaconda3\lib\site-packages\numpy\core\numeric.py in ones(shape, dtype, order) 190 191 """ --> 192 a = empty(shape, dtype, order) 193 multiarray.copyto(a, 1, casting='unsafe') 194 return a
MemoryError:
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iskandr/knnimpute/issues/10, or mute the thread https://github.com/notifications/unsubscribe-auth/ABya7O6zbSiIrRTK983rzi8_5PYHU6qQks5trtQagaJpZM4ThI8l .