Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

This repository contains code and the Commercial Dataset needed for reproducing the results presented in the following paper: Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products .

Note: Updates to the repository coming soon.

Commercial Dataset: E-commerce dataset for LTR

The dataset and its description can be found here.

Neural network architecture of S-CNN

We selected a simple yet powerful CNN model proposed by Severyn et.al [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.723.6492&rep=rep1&type=pdf] for empirical evaluation of our CRM approach. We refer to this model as S-CNN in the paper. The figure below depicts the architecture of the neural network. This figure is taken from the paper of Severyn et.al. The implementation in Keras was adapted from https://github.com/gvishal/rank_text_cnn.

Deep learning architecture for reranking short text pairs

Evaluation

For evaluation we use the standard tool used by TREC community for evaluating ad-hoc retrieval tasks trec_eval. The latest version of this tool can be found here.

evaluation_metrics.py: contains function get_trec_eval_metrics(). This function returns evaluation results using the evaluation tool trec_eval.

Reproducing the results

First install Jupyter Notebook using following command: (For details, click here)

 pip3 install jupyter

If you are not familiar with running notebook, click here.

Download the treac_eval tool and Mercateo Dataset.

CRM_Training
- crm_training_clicks.ipynb: Run this jupyter notebook for training CRM model from AtB click logs [Reproducing results of Table 2 of the paper].
- crm_training_orders.ipynb: Run this jupyter notebook for training CRM model from order logs [Reproducing results of Table 3 of paper].
- crm_model.py: Keras implementation of CNN for short text pairs with counterfactual risk minimization (CRM) loss function.
Cross_Entropy_Training
- cross_entropy_training_clicks.ipnyb: Run this jupyter notebook for training CNN model with cross entropy loss [Reproducing results of Table 2 of paper].
- cross_entropy_training_orders.ipynb: Run this jupyter notebook for training CNN model with cross entropy loss [Reproducing results of Table 3 of paper].
- model_cross_entropy.py: Keras implementation of CNN for short text pairs with cross-entropy loss.

LambdaMART Training

Download the binary file of RankLib tool from here.
We used latest binary 'RankLib-2.1-patched.jar' for our experiments.

Train LambdaMART model, for Graded Order Labels [Table 3], by running this command:

 java -jar RankLib-2.1-patched.jar -train LambdaMART_files/New_Graded_Order_TrainFile.csv 
  -test LambdaMART_files/New_Graded_Order_TestFile.csv -validate LambdaMART_files/
  New_Graded_Order_DevFile.csv -ranker 6 -metric2t NDCG@10 -metric2T NDCG@10 
  -save Model_LMART_Graded_Orders.txt

In order to evaluate the saved model on other metrics {NDCG@5,P@5,P@10,RR,MAP}, run this command:

 java -jar RankLib-2.1-patched.jar -load Model_LMART_Graded_Orders.txt -test
    LambdaMART_files/New_Graded_Order_DevFile.csv -metric2T NDCG@5

Affect of DNN architecture
- For reproducing the results in Table 4 of the paper, refer to MatchZoo.
- MatchZoo has comprehensive documentation on the dependencies and how to run the models.
- For a fair comparison with S-CNN model, we modified the models in MatchZoo and added a fully connected layer before the last layer. This layer is added so that we can utilize the dense features.

Dependencies

python 2.7 or higher
numpy
keras
trec_eval

For more information about the dataset and model please refer to the paper. For any questions/bugs, you can report issue here.

CRM-LTR
CRM-LTR copied to clipboard

Metadata

Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

Commercial Dataset: E-commerce dataset for LTR

Neural network architecture of S-CNN

Evaluation

Reproducing the results

Dependencies

← Metadata

Owner

Metadata

CRM-LTR CRM-LTR copied to clipboard

Metadata

Mend The Learning Approach, Not the Data: Insights for Ranking E-Commerce Products

Commercial Dataset: E-commerce dataset for LTR

Neural network architecture of S-CNN

Evaluation

Reproducing the results

Dependencies

← Metadata

Owner

Metadata

CRM-LTR
CRM-LTR copied to clipboard