distython icon indicating copy to clipboard operation
distython copied to clipboard

The first parameter passed from the Knn.fit to the Metric does not belong to the DataFrame

Open 08volt opened this issue 4 years ago • 3 comments

Hi, I got a problem with the Knn.fit when I'm using the HVDM metric. To visualize the data I'm working on I added two prints at the beginning of the function:

Schermata 2020-10-23 alle 17 03 39

I dont understand why when I call the Knn.fit the "x" parameter it's this different from the dataframe:

Schermata 2020-10-23 alle 17 05 36

Anyone can help me understand what the knn is doing?

Thank you

08volt avatar Oct 23 '20 15:10 08volt

Hi, HVDM seems to have a bug and doesn't work correctly, please use HEOM instead

KacperKubara avatar Feb 09 '21 16:02 KacperKubara

import numpy as np from sklearn.neighbors import NearestNeighbors from sklearn.datasets import load_boston

Importing a custom metric class

from distython import HEOM from distython import HVDM from distython import VDM

Load the dataset from sklearn

boston = load_boston() print(type(boston)) print(boston.data.shape) boston_data = boston["data"]

Categorical variables in the data

categorical_ix = [3, 8] y_ix=[12]

The problem here is that NearestNeighbors can't handle np.nan

So we have to set up the NaN equivalent

nan_eqv = 12345

Introduce some missingness to the data for the purpose of the example

row_cnt, col_cnt = boston_data.shape for i in range(row_cnt): for j in range(col_cnt): rand_val = np.random.randint(20, size=1) if rand_val == 10: boston_data[i, j] = nan_eqv

Declare the HEOM with a correct NaN equivalent value

heom_metric = HEOM(boston_data, categorical_ix, nan_equivalents = [nan_eqv]) hvdm_metric = HVDM(boston_data, y_ix,categorical_ix, nan_equivalents = [nan_eqv])

Declare NearestNeighbor and link the metric

neighbor = NearestNeighbors(metric = heom_metric.heom) neighbor1 = NearestNeighbors(metric = hvdm_metric.hvdm)

Fit the model which uses the custom distance metric

neighbor.fit(boston_data) neighbor1.fit(boston_data)

Return 5-Nearest Neighbors to the 1st instance (row 1)

result = neighbor.kneighbors(boston_data[0].reshape(1, -1), n_neighbors = 5) result1 = neighbor1.kneighbors(boston_data[0].reshape(1, -1), n_neighbors = 5)

print(result) print(result1)

Error Division by zero is not allowed!

UnboundLocalError Traceback (most recent call last) in 1 # Fit the model which uses the custom distance metric 2 neighbor.fit(boston_data) ----> 3 neighbor1.fit(boston_data) 4 # Return 5-Nearest Neighbors to the 1st instance (row 1) 5 result = neighbor.kneighbors(boston_data[0].reshape(1, -1), n_neighbors = 5)

varshakhandekar avatar Apr 01 '21 06:04 varshakhandekar

there is division by zero error in HVDM

varshakhandekar avatar Apr 01 '21 06:04 varshakhandekar