nlp_tasks
nlp_tasks copied to clipboard

Published 20 hours ago •

→

Metadata

Natural Language Processing Tasks and References

Readme
Issues

Natural Language Processing Tasks and Selected References

I've been working on several natural language processing tasks for a long time. One day, I felt like drawing a map of the NLP field where I earn a living. I'm sure I'm not the only person who wants to see at a glance which tasks are in NLP.

I did my best to cover as many as possible tasks in NLP, but admittedly this is far from exhaustive purely due to my lack of knowledge. And selected references are biased towards recent deep learning accomplishments. I expect these serve as a starting point when you're about to dig into the task. I'll keep updating this repo myself, but what I really hope is you collaborate on this work. Don't hesitate to send me a pull request!

Oct. 13, 2017.
by Kyubyong

Reviewed and updated by YJ Choe on Oct. 18, 2017.

Anaphora Resolution

See Coreference Resolution

Automated Essay Scoring

PAPER Automatic Text Scoring Using Neural Networks
PAPER A Neural Approach to Automated Essay Scoring
CHALLENGE Kaggle: The Hewlett Foundation: Automated Essay Scoring
PROJECT EASE (Enhanced AI Scoring Engine)

Automatic Speech Recognition

WIKI Speech recognition
PAPER Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
PAPER WaveNet: A Generative Model for Raw Audio
PROJECT A TensorFlow implementation of Baidu's DeepSpeech architecture
PROJECT Speech-to-Text-WaveNet : End-to-end sentence level English speech recognition using DeepMind's WaveNet
CHALLENGE The 5th CHiME Speech Separation and Recognition Challenge
DATA The 5th CHiME Speech Separation and Recognition Challenge
DATA CSTR VCTK Corpus
DATA LibriSpeech ASR corpus
DATA Switchboard-1 Telephone Speech Corpus
DATA TED-LIUM Corpus
DATA Open Speech and Language Resources
DATA Common Voice

Automatic Summarisation

WIKI Automatic summarization
BOOK Automatic Text Summarization
PAPER Text Summarization Using Neural Networks
PAPER Ranking with Recursive Neural Networks and Its Application to Multi-Document Summarization
DATA Text Analytics Conferences (TAC)
DATA Document Understanding Conferences (DUC)

Coreference Resolution

INFO Coreference Resolution
PAPER Deep Reinforcement Learning for Mention-Ranking Coreference Models
PAPER Improving Coreference Resolution by Learning Entity-Level Distributed Representations
CHALLENGE CoNLL 2012 Shared Task: Modeling Multilingual Unrestricted Coreference in OntoNotes
CHALLENGE CoNLL 2011 Shared Task: Modeling Unrestricted Coreference in OntoNotes
CHALLENGE SemEval 2018 Task 4: Character Identification on Multiparty Dialogues

Entity Linking

See Named Entity Disambiguation

Grammatical Error Correction

PAPER A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction
PAPER Neural Network Translation Models for Grammatical Error Correction
PAPER Adapting Sequence Models for Sentence Correction
CHALLENGE CoNLL-2013 Shared Task: Grammatical Error Correction
CHALLENGE CoNLL-2014 Shared Task: Grammatical Error Correction
DATA NUS Non-commercial research/trial corpus license
DATA Lang-8 Learner Corpora
DATA Cornell Movie--Dialogs Corpus
PROJECT Deep Text Corrector
PRODUCT deep grammar

Grapheme To Phoneme Conversion

PAPER Grapheme-to-Phoneme Models for (Almost) Any Language
PAPER Polyglot Neural Language Models: A Case Study in Cross-Lingual Phonetic Representation Learning
PAPER Multitask Sequence-to-Sequence Models for Grapheme-to-Phoneme Conversion
PROJECT Sequence-to-Sequence G2P toolkit
PROJECT g2p_en: A Simple Python Module for English Grapheme To Phoneme Conversion
DATA Multilingual Pronunciation Data

Humor and Sarcasm Detection

PAPER Automatic Sarcasm Detection: A Survey
PAPER Magnets for Sarcasm: Making Sarcasm Detection Timely, Contextual and Very Personal
PAPER Sarcasm Detection on Twitter: A Behavioral Modeling Approach
CHALLENGE SemEval-2017 Task 6: #HashtagWars: Learning a Sense of Humor
CHALLENGE SemEval-2017 Task 7: Detection and Interpretation of English Puns
DATA Sarcastic comments from Reddit
DATA Sarcasm Corpus V2
DATA Sarcasm Amazon Reviews Corpus

Language Grounding

WIKI Symbol grounding problem
PAPER The Symbol Grounding Problem
PAPER From phonemes to images: levels of representation in a recurrent neural model of visually-grounded language learning
PAPER Encoding of phonology in a recurrent neural model of grounded speech
PAPER Gated-Attention Architectures for Task-Oriented Language Grounding
PAPER Sound-Word2Vec: Learning Word Representations Grounded in Sounds
COURSE Language Grounding to Vision and Control
WORKSHOP Language Grounding for Robotics

Language Guessing

See Language Identification

Language Identification

WIKI Language identification
PAPER AUTOMATIC LANGUAGE IDENTIFICATION USING DEEP NEURAL NETWORKS
PAPER Natural Language Processing with Small Feed-Forward Networks
CHALLENGE 2015 Language Recognition Evaluation

Language Modeling

WIKI Language model
TOOLKIT KenLM Language Model Toolkit
PAPER Distributed Representations of Words and Phrases and their Compositionality
PAPER Generating Sequences with Recurrent Neural Networks
PAPER Character-Aware Neural Language Models
THESIS Statistical Language Models Based on Neural Networks
DATA Penn Treebank
TUTORIAL TensorFlow Tutorial on Language Modeling with Recurrent Neural Networks

Language Recognition

See Language Identification

Lemmatisation

WIKI Lemmatisation
PAPER Joint Lemmatization and Morphological Tagging with LEMMING
TOOLKIT WordNet Lemmatizer
DATA Treebank-3

Lip-reading

WIKI Lip reading
PAPER LipNet: End-to-End Sentence-level Lipreading
PAPER Lip Reading Sentences in the Wild
PAPER Large-Scale Visual Speech Recognition
PROJECT Lip Reading - Cross Audio-Visual Recognition using 3D Convolutional Neural Networks
PRODUCT Liopa
DATA The GRID audiovisual sentence corpus
DATA The BBC-Oxford 'Multi-View Lip Reading Sentences' (MV-LRS) Dataset

Machine Translation

PAPER Neural Machine Translation by Jointly Learning to Align and Translate
PAPER Neural Machine Translation in Linear Time
PAPER Attention Is All You Need
PAPER Six Challenges for Neural Machine Translation
PAPER Phrase-Based & Neural Unsupervised Machine Translation
CHALLENGE ACL 2014 NINTH WORKSHOP ON STATISTICAL MACHINE TRANSLATION
CHALLENGE EMNLP 2017 SECOND CONFERENCE ON MACHINE TRANSLATION (WMT17)
DATA OpenSubtitles2016
DATA WIT3: Web Inventory of Transcribed and Translated Talks
DATA The QCRI Educational Domain (QED) Corpus
PAPER Multi-task Sequence to Sequence Learning
PAPER Unsupervised Pretraining for Sequence to Sequence Learning
PAPER Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
TOOLKIT Subword Neural Machine Translation with Byte Pair Encoding (BPE)
TOOLKIT Multi-Way Neural Machine Translation
TOOLKIT OpenNMT: Open-Source Toolkit for Neural Machine Translation

Morphological Inflection Generation

WIKI Inflection
PAPER Morphological Inflection Generation Using Character Sequence to Sequence Learning
CHALLENGE SIGMORPHON 2016 Shared Task: Morphological Reinflection
DATA sigmorphon2016

Named Entity Disambiguation

WIKI Entity linking
PAPER Robust and Collective Entity Disambiguation through Semantic Embeddings

Named Entity Recognition

WIKI Named-entity recognition
PAPER Neural Architectures for Named Entity Recognition
PROJECT OSU Twitter NLP Tools
CHALLENGE Named Entity Recognition in Twitter
CHALLENGE CoNLL 2002 Language-Independent Named Entity Recognition
CHALLENGE Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition
DATA CoNLL-2002 NER corpus
DATA CoNLL-2003 NER corpus
DATA NUT Named Entity Recognition in Twitter Shared task
TOOLKIT Stanford Named Entity Recognizer

Paraphrase Detection

PAPER Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
PROJECT Paralex: Paraphrase-Driven Learning for Open Question Answering
CHALLENGE SemEval-2015 Task 1: Paraphrase and Semantic Similarity in Twitter
DATA Microsoft Research Paraphrase Corpus
DATA Microsoft Research Video Description Corpus
DATA Pascal Dataset
DATA Flickr Dataset
DATA The SICK data set
DATA PPDB: The Paraphrase Database
DATA WikiAnswers Paraphrase Corpus

Paraphrase Generation

PAPER Neural Paraphrase Generation with Stacked Residual LSTM Networks
DATA Neural Paraphrase Generation with Stacked Residual LSTM Networks
CODE Neural Paraphrase Generation with Stacked Residual LSTM Networks
PAPER A Deep Generative Framework for Paraphrase Generation
PAPER Paraphrasing Revisited with Neural Machine Translation

Parsing

WIKI Parsing
TOOLKIT The Stanford Parser: A statistical parser
TOOLKIT spaCy parser
PAPER Grammar as a Foreign Language
PAPER A fast and accurate dependency parser using neural networks
PAPER Universal Semantic Parsing
CHALLENGE CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
CHALLENGE CoNLL 2016 Shared Task: Multilingual Shallow Discourse Parsing
CHALLENGE CoNLL 2015 Shared Task: Shallow Discourse Parsing
CHALLENGE SemEval-2016 Task 8: The meaning representations may be abstract, but this task is concrete!

Part-of-speech Tagging

WIKI Part-of-speech tagging
PAPER Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss
PAPER Unsupervised Part-Of-Speech Tagging with Anchor Hidden Markov Models
DATA Treebank-3
TOOLKIT nltk.tag package

Pinyin-To-Chinese Conversion

WIKI Pinyin input method
PAPER Neural Network Language Model for Chinese Pinyin Input Method Engine
PROJECT Neural Chinese Transliterator

Question Answering

WIKI Question answering
PAPER Ask Me Anything: Dynamic Memory Networks for Natural Language Processing
PAPER Dynamic Memory Networks for Visual and Textual Question Answering
CHALLENGE TREC Question Answering Task
CHALLENGE NTCIR-8: Advanced Cross-lingual Information Access (ACLIA)
CHALLENGE CLEF Question Answering Track
CHALLENGE SemEval-2017 Task 3: Community Question Answering
CHALLENGE SemEval-2018 Task 11: Machine Comprehension using Commonsense Knowledge
DATA MS MARCO: Microsoft MAchine Reading COmprehension Dataset
DATA Maluuba NewsQA
DATA SQuAD: 100,000+ Questions for Machine Comprehension of Text
DATA GraphQuestions: A Characteristic-rich Question Answering Dataset
DATA Story Cloze Test and ROCStories Corpora
DATA Microsoft Research WikiQA Corpus
DATA DeepMind Q&A Dataset
DATA QASent
DATA Textbook Question Answering

Relationship Extraction

WIKI Relationship extraction
PAPER A deep learning approach for relationship extraction from interaction context in social manufacturing paradigm
CHALLENGE SemEval-2018 task 7 Semantic Relation Extraction and Classification in Scientific Papers

Semantic Role Labeling

WIKI Semantic role labeling
BOOK Semantic Role Labeling
PAPER End-to-end Learning of Semantic Role Labeling Using Recurrent Neural Networks
PAPER Neural Semantic Role Labeling with Dependency Path Embeddings
PAPER Deep Semantic Role Labeling: What Works and What's Next
CHALLENGE CoNLL-2005 Shared Task: Semantic Role Labeling
CHALLENGE CoNLL-2004 Shared Task: Semantic Role Labeling
TOOLKIT Illinois Semantic Role Labeler (SRL)
DATA CoNLL-2005 Shared Task: Semantic Role Labeling

Sentence Boundary Disambiguation

WIKI Sentence boundary disambiguation
PAPER A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain
TOOLKIT NLTK Tokenizers
DATA The British National Corpus
DATA Switchboard-1 Telephone Speech Corpus

Sentiment Analysis

WIKI Sentiment analysis
INFO Awesome Sentiment Analysis
CHALLENGE Kaggle: UMICH SI650 - Sentiment Classification
CHALLENGE SemEval-2017 Task 4: Sentiment Analysis in Twitter
CHALLENGE SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
PROJECT SenticNet
PROJECT Stanford NLP Group Sentiment Analysis
DATA Multi-Domain Sentiment Dataset (version 2.0)
DATA Stanford Sentiment Treebank
DATA Twitter Sentiment Corpus
DATA Twitter Sentiment Analysis Training Corpus
DATA AFINN: List of English words rated for valence

Sign Language Recognition/Translation

PAPER Video-based Sign Language Recognition without Temporal Segmentation
PAPER SubUNets: End-to-end Hand Shape and Continuous Sign Language Recognition
DATA RWTH-PHOENIX-Weather
DATA ASLLRP
PROJECT SignAll

Singing Voice Synthesis

PAPER Singing voice synthesis based on deep neural networks
PAPER A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs
PRODUCT VOCALOID: voice synthesis technology and software developed by Yamaha
CHALLENGE Special Session Interspeech 2016 Singing synthesis challenge "Fill-in the Gap"

Social Science Applications

WORKSHOP NLP+CSS: Workshops on Natural Language Processing and Computational Social Science
TOOLKIT Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints
TOOLKIT Online Variational Bayes for Latent Dirichlet Allocation (LDA)
GROUP The University of Chicago Knowledge Lab

Source Separation

WIKI Source separation
PAPER From Blind to Guided Audio Source Separation
PAPER Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation
CHALLENGE Signal Separation Evaluation Campaign (SiSEC)
CHALLENGE CHiME Speech Separation and Recognition Challenge

Speaker Authentication

See Speaker Verification

Speaker Diarisation

WIKI Speaker diarisation
PAPER DNN-based speaker clustering for speaker diarisation
PAPER Unsupervised Methods for Speaker Diarization: An Integrated and Iterative Approach
PAPER Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion
CHALLENGE Rich Transcription Evaluation

Speaker Recognition

WIKI Speaker recognition
PAPER A NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK
PAPER DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION
PAPER Deep Speaker: an End-to-End Neural Speaker Embedding System
PROJECT Voice Vector: which of the Hollywood stars is most similar to my voice?
CHALLENGE NIST Speaker Recognition Evaluation (SRE)
INFO Are there any suggestions for free databases for speaker recognition?
DATA VoxCeleb2: Deep Speaker Recognition

Speech Reading

See Lip-reading

Speech Recognition

See Automatic Speech Recognition

Speech Segmentation

WIKI Speech_segmentation
PAPER Word Segmentation by 8-Month-Olds: When Speech Cues Count More Than Statistics
PAPER Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
PAPER Unsupervised Lexicon Discovery from Acoustic Input
PAPER Weakly supervised spoken term discovery using cross-lingual side information
DATA CALLHOME Spanish Speech

Speech Synthesis

WIKI Speech synthesis
PAPER Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
PAPER WaveNet: A Generative Model for Raw Audio
PAPER Tacotron: Towards End-to-End Speech Synthesis
PAPER Deep Voice 3: 2000-Speaker Neural Text-to-Speech
PAPER Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention
DATA The World English Bible
DATA LJ Speech Dataset
DATA Lessac Data
CHALLENGE Blizzard Challenge 2017
PRODUCT Lyrebird
PROJECT The Festvox project
TOOLKIT Merlin: The Neural Network (NN) based Speech Synthesis System

Speech Enhancement

WIKI Speech enhancement
BOOK Speech enhancement: theory and practice
PAPER An Experimental Study on Speech Enhancement BasedonDeepNeuralNetwork
PAPER A Regression Approach to Speech Enhancement BasedonDeepNeuralNetworks
PAPER Speech Enhancement Based on Deep Denoising Autoencoder

Speech-To-Text

See Automatic Speech Recognition

Spoken Term Detection

See Speech Segmentation

Stemming

WIKI Stemming
PAPER A BACKPROPAGATION NEURAL NETWORK TO IMPROVE ARABIC STEMMING
TOOLKIT NLTK Stemmers

Term Extraction

Text Similarity

WIKI Semantic similarity
PAPER A Survey of Text Similarity Approaches
PAPER Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks
PAPER Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks
CHALLENGE SemEval-2014 Task 3: Cross-Level Semantic Similarity
CHALLENGE SemEval-2014 Task 10: Multilingual Semantic Textual Similarity
CHALLENGE SemEval-2017 Task 1: Semantic Textual Similarity
WIKI Semantic Textual Similarity Wiki

Text Simplification

WIKI Text simplification
PAPER Aligning Sentences from Standard Wikipedia to Simple Wikipedia
PAPER Problems in Current Text Simplification Research: New Data Can Help
DATA Newsela Data

Text-To-Speech

See Speech Synthesis

Textual Entailment

WIKI Textual entailment
PROJECT Textual Entailment with TensorFlow
PAPER Textual Entailment with Structured Attentions and Composition
CHALLENGE SemEval-2014 Task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment
CHALLENGE SemEval-2013 Task 7: The Joint Student Response Analysis and 8th Recognizing Textual Entailment Challenge

Transliteration

WIKI Transliteration
INFO Transliteration of Non-Latin scripts
PAPER A Deep Learning Approach to Machine Transliteration
CHALLENGE NEWS 2016 Shared Task on Transliteration of Named Entities
PROJECT Neural Japanese Transliteration—can you do better than SwiftKey™ Keyboard?

Voice Conversion

PAPER PHONETIC POSTERIORGRAMS FOR MANY-TO-ONE VOICE CONVERSION WITHOUT PARALLEL DATA TRAINING
PROJECT Deep neural networks for voice conversion (voice style transfer) in Tensorflow
PROJECT An implementation of voice conversion system utilizing phonetic posteriorgrams
CHALLENGE Voice Conversion Challenge 2016
CHALLENGE Voice Conversion Challenge 2018
DATA CMU_ARCTIC speech synthesis databases
DATA TIMIT Acoustic-Phonetic Continuous Speech Corpus

Voice Recognition

See Speaker recognition

Word Embeddings

WIKI Word embedding
TOOLKIT Gensim: word2vec
TOOLKIT fastText
TOOLKIT GloVe: Global Vectors for Word Representation
INFO Where to get a pretrained model
PROJECT Pre-trained word vectors
PROJECT Pre-trained word vectors of 30+ languages
PROJECT Polyglot: Distributed word representations for multilingual NLP
PROJECT BPEmb: a collection of pre-trained subword embeddings in 275 languages
CHALLENGE SemEval 2018 Task 10 Capturing Discriminative Attributes
PAPER Bilingual Word Embeddings for Phrase-Based Machine Translation
PAPER A Survey of Cross-Lingual Embedding Models

Word Prediction

INFO What is Word Prediction?
PAPER The prediction of character based on recurrent neural network language model
PAPER An Embedded Deep Learning based Word Prediction
PAPER Evaluating Word Prediction: Framing Keystroke Savings
DATA An Embedded Deep Learning based Word Prediction
PROJECT Word Prediction using Convolutional Neural Networks—can you do better than iPhone™ Keyboard?
CHALLENGE SemEval-2018 Task 2, Multilingual Emoji Prediction

Word Segmentation

WIKI Word segmentation
PAPER Neural Word Segmentation Learning for Chinese
PROJECT Convolutional neural network for Chinese word segmentation
TOOLKIT Stanford Word Segmenter
TOOLKIT NLTK Tokenizers

Word Sense Disambiguation

DATA Word-sense disambiguation
PAPER Train-O-Matic: Large-Scale Supervised Word Sense Disambiguation in Multiple Languages without Manual Training Data
DATA Train-O-Matic Data
DATA BabelNet

About

Natural Language Processing Tasks and References

language

natural-language-processing

nlp

3.0k

Stars

550

Forks

Watchers

Owner

← Metadata

3.0k

Stars

550

Forks

Watchers

Owner

Metadata

Natural Language Processing Tasks and References