distython
distython copied to clipboard
The first parameter passed from the Knn.fit to the Metric does not belong to the DataFrame
Hi, I got a problem with the Knn.fit when I'm using the HVDM metric. To visualize the data I'm working on I added two prints at the beginning of the function:
I dont understand why when I call the Knn.fit the "x" parameter it's this different from the dataframe:
Anyone can help me understand what the knn is doing?
Thank you
Hi, HVDM seems to have a bug and doesn't work correctly, please use HEOM instead
import numpy as np from sklearn.neighbors import NearestNeighbors from sklearn.datasets import load_boston
Importing a custom metric class
from distython import HEOM from distython import HVDM from distython import VDM
Load the dataset from sklearn
boston = load_boston() print(type(boston)) print(boston.data.shape) boston_data = boston["data"]
Categorical variables in the data
categorical_ix = [3, 8] y_ix=[12]
The problem here is that NearestNeighbors can't handle np.nan
So we have to set up the NaN equivalent
nan_eqv = 12345
Introduce some missingness to the data for the purpose of the example
row_cnt, col_cnt = boston_data.shape for i in range(row_cnt): for j in range(col_cnt): rand_val = np.random.randint(20, size=1) if rand_val == 10: boston_data[i, j] = nan_eqv
Declare the HEOM with a correct NaN equivalent value
heom_metric = HEOM(boston_data, categorical_ix, nan_equivalents = [nan_eqv]) hvdm_metric = HVDM(boston_data, y_ix,categorical_ix, nan_equivalents = [nan_eqv])
Declare NearestNeighbor and link the metric
neighbor = NearestNeighbors(metric = heom_metric.heom) neighbor1 = NearestNeighbors(metric = hvdm_metric.hvdm)
Fit the model which uses the custom distance metric
neighbor.fit(boston_data) neighbor1.fit(boston_data)
Return 5-Nearest Neighbors to the 1st instance (row 1)
result = neighbor.kneighbors(boston_data[0].reshape(1, -1), n_neighbors = 5) result1 = neighbor1.kneighbors(boston_data[0].reshape(1, -1), n_neighbors = 5)
print(result) print(result1)
Error Division by zero is not allowed!
UnboundLocalError Traceback (most recent call last)
there is division by zero error in HVDM