rbig icon indicating copy to clipboard operation
rbig copied to clipboard

Information theory metrics calculation issue

Open surbhir08 opened this issue 2 years ago • 0 comments

Hi Team, Thanks for addressing the issue of density estimation for multidimensional data. I have a few questions as I am trying to implement information theory metrics:

  • Q1.Is this method apt for high dimensional tabular data?
  • Q2.I have been trying to run RBIG mutual info() over a tabular data and the results are exact same for all of them, I did check the results using SK learn MI score and got variables results (results not normalized in both cases- SK learn and RBIG). I don't understand the error, can you in anyway help me with this?

below is the piece of code I used:

X: features (attributes not in Y) Y: set of y attributes (attributes not in X) (let's say y1,y2,y3,y4) def calculate_miscore_xa(data,X,Y): mis_xy = [] y_attributes = [] for y in Y: rbig_model = MutualInfoRBIG(max_layers = 10000) rbig_model.fit(data[X], data[[y]]); mi_rbig = rbig_model.mutual_info() * np.log(2) mis_xy.append(mi_rbig) y_attributes.append(a)
mis_xy = pd.DataFrame({'Y':y_attributes, 'I(Xi,Y)': mis_xy}) return mis_xy

basically the results I am getting is I(X,y1) = I(X,y2) = I(X,y3) = I(X,y4) = exact same It's unusual hence I checked the results using https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mutual_info_score.html and the results for I(X,y1), I(X,y2), I(X,y3),I(X,y4) differ. Can you help me understand if there is anythings I am doing wrong ?

Also the original calculation using entropy implemented in information theory notebook can be used used as base for tabular data by substituting respective X and Y in 2d format?

Thanks and Regards Surbhi

surbhir08 avatar Apr 29 '22 05:04 surbhir08