sentiment-analysis-in-russian
sentiment-analysis-in-russian copied to clipboard
Fine-tuned Multilingual BERT and Multilingual USE for sentiment analysis in Russian. RuReviews, RuSentiment, Kaggle Russian News Dataset, LINIS Crowd, and RuTweetCorp were utilized as training data.
Sentiment Analysis in Russian
This repository contains links to models for sentiment analysis of texts in Russian, which were trained within Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian and Deep Transfer Learning Baselines for Sentiment Analysis in Russian articles.
Evaluation of Pre-Trained Transformers for Sentiment Analysis of Texts in Russian
Model | Score |
Rank | Dataset | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SentiRuEval-2016 |
RuSentiment | KRND | LINIS Crowd | RuTweetCorp | RuReviews | |||||||||
TC | Banks | |||||||||||||
micro F1 | macro F1 | F1 | micro F1 | macro F1 | F1 | wighted F1 | F1 | F1 | F1 | F1 | F1 | |||
SOTA | n/s | 76.71 | 66.40 | 70.68 | 67.51 | 69.53 | 74.06 | 78.50 | n/s | 73.63 | 60.51 | 83.68 | 77.44 | |
XLM-RoBERTa-Large | 76.37 | 1 | 82.26 | 76.36 | 79.42 | 76.35 | 76.08 | 80.89 | 78.31 | 75.27 | 75.17 | 60.03 | 88.91 | 78.81 |
SBERT-Large | 75.43 | 2 | 78.40 | 71.36 | 75.14 | 72.39 | 71.87 | 77.72 | 78.58 | 75.85 | 74.20 | 60.64 | 88.66 | 77.41 |
MBARTRuSumGazeta | 74.70 | 3 | 76.06 | 68.95 | 73.04 | 72.34 | 71.93 | 77.83 | 76.71 | 73.56 | 74.18 | 60.54 | 87.22 | 77.51 |
Conversational RuBERT | 74.44 | 4 | 76.69 | 69.09 | 73.11 | 69.44 | 68.68 | 75.56 | 77.31 | 74.40 | 73.10 | 59.95 | 87.86 | 77.78 |
LaBSE | 74.11 | 5 | 77.00 | 69.19 | 73.55 | 70.34 | 69.83 | 76.38 | 74.94 | 70.84 | 73.20 | 59.52 | 87.89 | 78.47 |
XLM-RoBERTa-Base | 73.60 | 6 | 76.35 | 69.37 | 73.42 | 68.45 | 67.45 | 74.05 | 74.26 | 70.44 | 71.40 | 60.19 | 87.90 | 78.28 |
RuBERT | 73.45 | 7 | 74.03 | 66.14 | 70.75 | 66.46 | 66.40 | 73.37 | 75.49 | 71.86 | 72.15 | 60.55 | 86.99 | 77.41 |
MBART-50-Large-Many-to-Many | 73.15 | 8 | 75.38 | 67.81 | 72.26 | 67.13 | 66.97 | 73.85 | 74.78 | 70.98 | 71.98 | 59.20 | 87.05 | 77.24 |
SlavicBERT | 71.96 | 9 | 71.45 | 63.03 | 68.44 | 64.32 | 63.99 | 71.31 | 72.13 | 67.57 | 72.54 | 58.70 | 86.43 | 77.16 |
EnRuDR-BERT | 71.51 | 10 | 72.56 | 64.74 | 69.07 | 61.44 | 60.21 | 68.34 | 74.19 | 69.94 | 69.33 | 56.55 | 87.12 | 77.95 |
RuDR-BERT | 71.14 | 11 | 72.79 | 64.23 | 68.36 | 61.86 | 60.92 | 68.48 | 74.65 | 70.63 | 68.74 | 54.45 | 87.04 | 77.91 |
MBART-50-Large | 69.46 | 12 | 70.91 | 62.67 | 67.24 | 61.12 | 60.25 | 68.41 | 72.88 | 68.63 | 70.52 | 46.39 | 86.48 | 77.52 |
Deep Transfer Learning Baselines for Sentiment Analysis in Russian
This repository contains the fine-tuned Multilingual Bidirectional Encoder Representations from Transformers (M-BERT), RuBERT, and two versions of Multilingual Universal Sentence Encoder (M-USE) for sentiment classification in Russian referenced in Deep Transfer Learning Baselines for Sentiment Analysis in Russian.
Dataset | Measure | Current SOTA | M-BERT | RuBERT | M-USE-CNN | M-USE-Trans |
---|---|---|---|---|---|---|
SentiRuEval-2016 TC | F1 | 68.42 | 66.29 |
70.68 |
63.64 | 68.27 |
macro F1PN | 66.07 | 61.78 | 66.40 | 58.97 | 62.77 | |
micro F1PN | 74.11 | 72.45 | 76.71 | 71.31 | 75.00 | |
SentiRuEval-2016 Banks | F1 | 74.06 | 65.31 | 72.83 | 66.71 | 72.40 |
macro F1PN | 69.53 | 58.00 | 65.89 | 58.73 | 65.04 | |
micro F1PN | 71.76 | 60.52 | 68.43 | 62.41 | 68.21 | |
SentiRuEval-2016 TC | F1 | 68.54 | 60.47 | 64.39 | 60.57 | 64.28 |
macro F1PN | 63.47 | 53.16 | 57.76 | 52.37 | 57.60 | |
micro F1PN | 67.51 | 57.03 | 61.38 | 57.76 | 61.18 | |
SentiRuEval-2016 Banks | F1 | 79.51 | 67.65 | 70.58 | 66.32 | 69.62 |
macro F1PN | 67.44 | 56.97 | 60.95 | 54.74 | 59.12 | |
micro F1PN | 70.09 | 59.32 | 63.33 | 57.61 | 62.17 | |
RuSentiment | F1 | n/s | 71.37 | 72.03 | 66.27 | 68.60 |
weighted F1 | 78.50 | 75.13 | 75.71 | 71.05 | 73.42 | |
Kaggle Russian News Dataset | F1 | 70.00 | 71.36 | 73.63 | 71.27 | 72.66 |
LINIS Crowd | F1 | 37.29 | 42.73 | 60.51 | 56.34 | 56.95 |
RuTweetCorp (binary) | F1 | 75.95 | 83.04 | 83.69 | 81.34 | 83.17 |
RuTweetCorp (trinary) | F1 | 78.1 | 80.10 | 80.79 | 78.39 | 79.69 |
RuReviews | F1 | 75.45 | 77.31 | 77.44 | 76.63 | 76.94 |
SOTA approaches for RuReviews, RuSentiment, Kaggle Russian News Dataset, and RuTweetCorp were described in papers (Smetanin and Komarov, 2019), (Baymurzina et al., 2019), (Shalkarbayuli et al., 2018), and (Rubtsova, 2018), consequently. The SOTA approach for LINIS Crowd was implemented based on the paper (Koltsova et al., 2016).
Sentiment Datasets in Russian
Despite the fact that Russian is one of the most common languages in the World Wide Web, generally it is not as well-resourced as the English language, especially in the field of sentiment analysis. Even though many studies aim at sentiment classification, only few of them makes their datasets publicly available for the research community.
Dataset | Classes | Average lengths | Max lengths | Train Samples | Test Samples | Overall Samples | Download Link |
---|---|---|---|---|---|---|---|
SentiRuEval-2016 (Loukachevitch and Rubtsova, 2016) | 3 | 87.0928 | 172 | 18,035 | 5,560 | 23,595 | Project page |
SentiRuEval-2015 Subtask (Loukachevitch et al., 2015) | 3 | 81.4986 | 172 | 8,580 | 7,738 | 16,318 | Project page |
RuTweetCorp (Rubtsova, 2013) | 3 | 89.1725 | 189 | n/a | n/a | 334836 | Project page |
LINIS Crowd (Koltsova et al., 2016) | 5 | n/a | n/a | n/a | n/a | n/a | Project page |
RuSentiment (Rogers et al., 2018) | 5 | 82.0279 | 800 | 28218 | 2967 | 31185 | Project page |
Kaggle Russian News Dataset | 3 | 3911.8501 | 381498 | n/a | n/a | 8263 | Kaggle page |
RuReviews (Smetanin and Komarov, 2019) | 3 | 130.0693 | 1007 | n/a | n/a | 90,000 | GitHub page |
Fine-Tuned Models
To download fine-tuned models for Russian, please follow the link https://yadi.sk/d/Xp5vLG_5xCQL-Q.
Citation
@article{Smetanin2020Deep,
title = {Deep transfer learning baselines for sentiment analysis in Russian},
author = {Sergey Smetanin and Mikhail Komarov},
journal = {Information Processing & Management},
volume = {58},
number = {3},
pages = {102484},
year = {2021},
issn = {0306-4573},
doi = {https://doi.org/10.1016/j.ipm.2020.102484},
url = {https://www.sciencedirect.com/science/article/pii/S0306457320309730}
}
License
See LICENSE.