CategoricalEmbedder icon indicating copy to clipboard operation
CategoricalEmbedder copied to clipboard

Unable to ce.fit_transform on the test set

Open lavpy opened this issue 5 years ago • 1 comments

Hello Shivanand:

I am trying to implement your library in a Kaggle competition (https://www.kaggle.com/c/house-prices-advanced-regression-techniques). I have transformed the training set applying the following code:

embeddings = ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info, is_classification=True, epochs=100,batch_size=256)

Got the embeddings from the above code. Tried to transform test set the following way:

test_transformed = ce.fit_transform(X_test, embeddings=embeddings, encoders=encoders, drop_categorical_vars=True)

But it raises the following error: You are trying to merge on int32 and object columns. If you wish to proceed you should use pd.concat

lavpy avatar Sep 05 '20 03:09 lavpy

Hi @lavpy , This repo is deprecated and is no longer maintained.

To solve you problem, you may need to downgrade the dependencies

!pip install tensorflow_addons==0.8.3
!pip install tqdm==4.41.1
!pip install keras==2.3.1
!pip install tensorflow==2.2.0

Then,

import categorical_embedder as ce
embedding_info = ce.get_embedding_info(X)
X_encoded,encoders = ce.get_label_encoded_data(X)

embeddings = ce.get_embeddings(X, y, categorical_embedding_info=embedding_info, 
                            is_classification=True, epochs=100, batch_size=256)
embeddings_df = ce.get_embeddings_in_dataframe(embeddings, encoders)

Now, embeddings_df will have the embeddings of every categorical variables, you can access them by

embeddings_df['education']

                 education_embedding_0	education_embedding_1
Bachelor's	         0.226899	             0.150172
Below Secondary	          0.438177	              0.406307
Master's & above	 0.071212	            0.054443

Now - Just map these embeddings in your data against your categorical variables

Shivanandroy avatar Sep 18 '20 20:09 Shivanandroy