Unable to ce.fit_transform on the test set
Hello Shivanand:
I am trying to implement your library in a Kaggle competition (https://www.kaggle.com/c/house-prices-advanced-regression-techniques). I have transformed the training set applying the following code:
embeddings = ce.get_embeddings(X_train, y_train, categorical_embedding_info=embedding_info, is_classification=True, epochs=100,batch_size=256)
Got the embeddings from the above code. Tried to transform test set the following way:
test_transformed = ce.fit_transform(X_test, embeddings=embeddings, encoders=encoders, drop_categorical_vars=True)
But it raises the following error: You are trying to merge on int32 and object columns. If you wish to proceed you should use pd.concat
Hi @lavpy , This repo is deprecated and is no longer maintained.
To solve you problem, you may need to downgrade the dependencies
!pip install tensorflow_addons==0.8.3
!pip install tqdm==4.41.1
!pip install keras==2.3.1
!pip install tensorflow==2.2.0
Then,
import categorical_embedder as ce
embedding_info = ce.get_embedding_info(X)
X_encoded,encoders = ce.get_label_encoded_data(X)
embeddings = ce.get_embeddings(X, y, categorical_embedding_info=embedding_info,
is_classification=True, epochs=100, batch_size=256)
embeddings_df = ce.get_embeddings_in_dataframe(embeddings, encoders)
Now, embeddings_df will have the embeddings of every categorical variables, you can access them by
embeddings_df['education']
education_embedding_0 education_embedding_1
Bachelor's 0.226899 0.150172
Below Secondary 0.438177 0.406307
Master's & above 0.071212 0.054443
Now - Just map these embeddings in your data against your categorical variables