NLP-Projects
                                
                                 NLP-Projects copied to clipboard
                                
                                    NLP-Projects copied to clipboard
                            
                            
                            
                        word2vec, sentence2vec, machine reading comprehension, dialog system, text classification, pretrained language model (i.e., XLNet, BERT, ELMo, GPT), sequence labeling, information retrieval, informati...
NLP-Projects
Natural Language Processing projects, which includes concepts and scripts about:
- 
- gensim,- fastTextand- tensorflowimplementations. See Chinese notes, 中文解读
 
- 
- doc2vec,- word2vec averagingand- Smooth Inverse Frequencyimplementations
 
- 
- Categories and components of dialog system
 
- 
- tensorflow LSTM(See Chinese notes 1, 中文解读 1 and Chinese notes 2, 中文解读 2)
- fastTextimplementation
 
- 
- Principle of ELMo, ULMFit, GPT, BERT, XLNet
 
- 
- Chinese_word_segmentation
- HMM Viterbiimplementations. See Chinese notes, 中文解读
 
- Named_Entity_Recognition
- Brands NER via bi-directional LSTM + CRF, tensorflowimplementation. See Chinese notes, 中文解读
 
- Brands NER via bi-directional LSTM + CRF, 
 
- Chinese_word_segmentation
Concepts
1. Attention
- Attention == weighted averages
- The attention review 1 and review 2 summarize attention mechanism into several types:
- Additive vs Multiplicative attention
- Self attention
- Soft vs Hard attention
- Global vs Local attention
 
2. CNNs, RNNs and Transformer
- 
Parallelization [1] - RNNs
- Why not good ?
- Last step's output is input of current step
 
- Solutions
- Simple Recurrent Units (SRU)
- Perform parallelization on each hidden state neuron independently
 
- Sliced RNNs
- Separate sequences into windows, use RNNs in each window, use another RNNs above windows
- Same as CNNs
 
 
- Simple Recurrent Units (SRU)
- CNNs
- Why good ?
- For different windows in one filter
- For different filters
 
 
- RNNs
- 
Long-range dependency [1] - CNNs
- Why not good ?
- Single convolution can only caputure window-range dependency
 
- Solutions
- Dilated CNNs
- Deep CNNs
- N * [Convolution + skip-connection]
- For example, window size=3 and sliding step=1, second convolution can cover 5 words (i.e., 1-2-3, 2-3-4, 3-4-5)
 
 
- Transformer > RNNs > CNNs
 
- CNNs
- 
Position [1] - 
CNNs - Why not good ?
- Convolution preserves relative-order information, but max-pooling discards them
 
- 
Solutions - Discard max-pooling, use deep CNNs with skip-connections instead
- Add position embedding, just like in ConvS2S
 
- 
- Why not good ?
- In self-attention, one word attends to other words and generate the summarization vector without relative position information
 
 
- 
- 
Semantic features extraction [2] - Transformer > CNNs == RNNs
 
3. Pattern of DL in NLP models [3]
- 
Data - Preprocess
- Sub-word segmentation to avoid OOV and reduce vocabulary size
 
- Pre-training (e.g., ELMO, BERT)
- Multi-task learning
- Transfer learning, ref_1, ref_2
- Use source task/domain Sto increase target task/domainT
 
- Use source task/domain 
- If Shas a zero/one/few instances, we call it zero-shot, one-shot, few-shot learning, respectively
 
- Preprocess
- 
Model - Encoder
- CNNs, RNNs, Transformer
 
- Structure
- Sequential, Tree, Graph
 
 
- Encoder
- 
Learning (change loss definition) - Adversarial learning
- Reinforcement learning
 
References
- [1] Review
- [2] Why self-attention? A targeted evaluation of neural machine translation architectures
- [3] ACL 2019 oral
Awesome public apis
Awesome packages
Chinese
English
- Spacy
- gensim
- Install tensorflow with one line: conda install tensorflow-gpu