Sarcasm-Detection-using-HNN

This is the PyTorch implementation of work presented in 'Modelling Context with User Embeddings for Sarcasm Detection in Social Media' (https://arxiv.org/pdf/1607.00976.pdf). The neural network takes a tweet (content) and corresponding user embedding (context) as input, and classifies the tweets as sarcastic/non-sarcastic. We further provide an implementation of our improved framework proposed in Sarcasm Detection using Hybrid Neural Network. (https://arxiv.org/abs/1908.07414).

System requirments

python 2.7
PyTorch 0.3.1
python package gensim
python package yandex.translate
python package ipdb

Running the code

1. Pre-requisites

Get pre-trained word embeddings (e.g. Skip-gram)
- Install the bin file from this link
- Unzip the .bin.gz fine and run the iPython notebook get_word2vec_embeddings.ipynb
- Place the .txt file obtained in DATA/embeddings/ and change its name to words.txt
Get pre-trained user embeddings for the user. The embeddings we used can be found here. Place the embeddings in DATA/embeddings and name the file as usr2vec.txt
Execute iPython notebook get_data.ipynb. This utility code is used to download tweets corresponding to the tweet ids and then preprocess these tweet messages.

2. Training and Evaluation

a. To run the original code

Run python train_CUE_CNN.py

b. To run the RNN + CNN Hybrid model on the new Dataset

Run python Headlines_RNN.py

Output, results and visualization

The code generate a progress folder, that contains sub folder for every run. Inside every run folder following two file are generated -

logs.txt which contains loss and accuracy on train/test/validation set after every epoch
stats.jpg that plots
- train/test/validation loss on a single plot
- train/test/validation accuracy on a single plot

Note:

Util files, pre-trained user embeddings and raw tweet ids were obtained from Original CUE-CNN

Cite

Please cite the following articles in suitable format if you use the dataset:

Text Format:

1. Misra, Rishabh and Prahal Arora. "Sarcasm Detection using News Headlines Dataset." AI Open (2023).
2. Misra, Rishabh and Jigyasa Grover. "Sculpting Data for ML: The first act of Machine Learning." ISBN 978-0-578-83125-1 (2021).

BibTex Format:

@article{misra2023Sarcasm,
  title = {Sarcasm Detection using News Headlines Dataset},
  journal = {AI Open},
  volume = {4},
  pages = {13-18},
  year = {2023},
  issn = {2666-6510},
  doi = {https://doi.org/10.1016/j.aiopen.2023.01.001},
  url = {https://www.sciencedirect.com/science/article/pii/S2666651023000013},
  author = {Rishabh Misra and Prahal Arora},
}

@book{misra2021sculpting,
author = {Misra, Rishabh and Grover, Jigyasa},
year = {2021},
month = {01},
pages = {},
title = {Sculpting Data for ML: The first act of Machine Learning},
isbn = {978-0-578-83125-1}
}

Sarcasm-Detection-using-NN
Sarcasm-Detection-using-NN copied to clipboard

Metadata

Sarcasm-Detection-using-HNN

System requirments

Running the code

1. Pre-requisites

2. Training and Evaluation

a. To run the original code

b. To run the RNN + CNN Hybrid model on the new Dataset

Output, results and visualization

Note:

Cite

← Metadata

Owner

Metadata

Sarcasm-Detection-using-NN Sarcasm-Detection-using-NN copied to clipboard

Metadata

Sarcasm-Detection-using-HNN

System requirments

Running the code

1. Pre-requisites

2. Training and Evaluation

a. To run the original code

b. To run the RNN + CNN Hybrid model on the new Dataset

Output, results and visualization

Note:

Cite

← Metadata

Owner

Metadata

Sarcasm-Detection-using-NN
Sarcasm-Detection-using-NN copied to clipboard