CRM-LTR
CRM-LTR copied to clipboard
Mend Your Learning Approach, Not the Data for Ranking E-Commerce Products
Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products
This repository contains code and the Commercial Dataset needed for reproducing the results presented in the following paper: Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products .
Note: Updates to the repository coming soon.
Commercial Dataset: E-commerce dataset for LTR
The dataset and its description can be found here.Neural network architecture of S-CNN
We selected a simple yet powerful CNN model proposed by Severyn et.al [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.723.6492&rep=rep1&type=pdf] for empirical evaluation of our CRM approach. We refer to this model as S-CNN in the paper. The figure below depicts the architecture of the neural network. This figure is taken from the paper of Severyn et.al. The implementation in Keras was adapted from https://github.com/gvishal/rank_text_cnn.Evaluation
Reproducing the results
First install Jupyter Notebook using following command: (For details, click here)pip3 install jupyter
If you are not familiar with running notebook, click here.
Download the treac_eval tool and Mercateo Dataset.
- CRM_Training
- crm_training_clicks.ipynb: Run this jupyter notebook for training CRM model from AtB click logs [Reproducing results of Table 2 of the paper].
- crm_training_orders.ipynb: Run this jupyter notebook for training CRM model from order logs [Reproducing results of Table 3 of paper].
- crm_model.py: Keras implementation of CNN for short text pairs with counterfactual risk minimization (CRM) loss function.
- Cross_Entropy_Training
- cross_entropy_training_clicks.ipnyb: Run this jupyter notebook for training CNN model with cross entropy loss [Reproducing results of Table 2 of paper].
- cross_entropy_training_orders.ipynb: Run this jupyter notebook for training CNN model with cross entropy loss [Reproducing results of Table 3 of paper].
- model_cross_entropy.py: Keras implementation of CNN for short text pairs with cross-entropy loss.
- LambdaMART Training
- Download the binary file of RankLib tool from here.
- We used latest binary 'RankLib-2.1-patched.jar' for our experiments.
- Train LambdaMART model, for Graded Order Labels [Table 3], by running this command:
java -jar RankLib-2.1-patched.jar -train LambdaMART_files/New_Graded_Order_TrainFile.csv -test LambdaMART_files/New_Graded_Order_TestFile.csv -validate LambdaMART_files/ New_Graded_Order_DevFile.csv -ranker 6 -metric2t NDCG@10 -metric2T NDCG@10 -save Model_LMART_Graded_Orders.txt
- In order to evaluate the saved model on other metrics {NDCG@5,P@5,P@10,RR,MAP}, run this command:
java -jar RankLib-2.1-patched.jar -load Model_LMART_Graded_Orders.txt -test LambdaMART_files/New_Graded_Order_DevFile.csv -metric2T NDCG@5
- Affect of DNN architecture
- For reproducing the results in Table 4 of the paper, refer to MatchZoo.
- MatchZoo has comprehensive documentation on the dependencies and how to run the models.
- For a fair comparison with S-CNN model, we modified the models in MatchZoo and added a fully connected layer before the last layer. This layer is added so that we can utilize the dense features.
Dependencies
- python 2.7 or higher
- numpy
- keras
- trec_eval
For more information about the dataset and model please refer to the paper.
For any questions/bugs, you can report issue here.