rbig
rbig copied to clipboard
Information theory metrics calculation issue
Hi Team, Thanks for addressing the issue of density estimation for multidimensional data. I have a few questions as I am trying to implement information theory metrics:
- Q1.Is this method apt for high dimensional tabular data?
- Q2.I have been trying to run RBIG mutual info() over a tabular data and the results are exact same for all of them, I did check the results using SK learn MI score and got variables results (results not normalized in both cases- SK learn and RBIG). I don't understand the error, can you in anyway help me with this?
below is the piece of code I used:
X: features (attributes not in Y)
Y: set of y attributes (attributes not in X) (let's say y1,y2,y3,y4)
def calculate_miscore_xa(data,X,Y):
mis_xy = []
y_attributes = []
for y in Y:
rbig_model = MutualInfoRBIG(max_layers = 10000)
rbig_model.fit(data[X], data[[y]]);
mi_rbig = rbig_model.mutual_info() * np.log(2)
mis_xy.append(mi_rbig)
y_attributes.append(a)
mis_xy = pd.DataFrame({'Y':y_attributes, 'I(Xi,Y)': mis_xy})
return mis_xy
basically the results I am getting is I(X,y1) = I(X,y2) = I(X,y3) = I(X,y4) = exact same It's unusual hence I checked the results using https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html and the results for I(X,y1), I(X,y2), I(X,y3),I(X,y4) differ. Can you help me understand if there is anythings I am doing wrong ?
Also the original calculation using entropy implemented in information theory notebook can be used used as base for tabular data by substituting respective X and Y in 2d format?
Thanks and Regards Surbhi