trafficstars

NLP-Projects

Natural Language Processing projects, which includes concepts and scripts about:

0_Word2vec
- gensim, fastText and tensorflow implementations. See Chinese notes, 中文解读
1_Sentence2vec
- doc2vec, word2vec averaging and Smooth Inverse Frequency implementations
2_Machine_reading_comprehension
3_Dialog_system
- Categories and components of dialog system
4_Text_classification
- tensorflow LSTM (See Chinese notes 1, 中文解读 1 and Chinese notes 2, 中文解读 2)
- fastText implementation
5_Pretraining_LM
- Principle of ELMo, ULMFit, GPT, BERT, XLNet
6_Sequence_labeling
- Chinese_word_segmentation
  - HMM Viterbi implementations. See Chinese notes, 中文解读
- Named_Entity_Recognition
  - Brands NER via bi-directional LSTM + CRF, tensorflow implementation. See Chinese notes, 中文解读
7_Information_retrieval
8_Information_extraction
9_Knowledge_graph
10_Text_generation
11_Network_embedding

Concepts

Attention == weighted averages
The attention review 1 and review 2 summarize attention mechanism into several types:
- Additive vs Multiplicative attention
- Self attention
- Soft vs Hard attention
- Global vs Local attention

Parallelization [1]
- RNNs
  - Why not good ?
  - Last step's output is input of current step
- Solutions
  - Simple Recurrent Units (SRU)
    - Perform parallelization on each hidden state neuron independently
  - Sliced RNNs
    - Separate sequences into windows, use RNNs in each window, use another RNNs above windows
    - Same as CNNs
- CNNs
  - Why good ?
  - For different windows in one filter
  - For different filters
Long-range dependency [1]
- CNNs
  - Why not good ?
  - Single convolution can only caputure window-range dependency
- Solutions
  - Dilated CNNs
  - Deep CNNs
    - N * [Convolution + skip-connection]
    - For example, window size=3 and sliding step=1, second convolution can cover 5 words (i.e., 1-2-3, 2-3-4, 3-4-5)
- Transformer > RNNs > CNNs
Position [1]
- CNNs
  - Why not good ?
  - Convolution preserves relative-order information, but max-pooling discards them
- Solutions
  - Discard max-pooling, use deep CNNs with skip-connections instead
  - Add position embedding, just like in ConvS2S
- Transformer
  - Why not good ?
  - In self-attention, one word attends to other words and generate the summarization vector without relative position information
Semantic features extraction [2]
- Transformer > CNNs == RNNs

Data
- Preprocess
  - Sub-word segmentation to avoid OOV and reduce vocabulary size
    - sentencepiece
- Pre-training (e.g., ELMO, BERT)
- Multi-task learning
- Transfer learning, ref_1, ref_2
  - Use source task/domain S to increase target task/domain T
- If S has a zero/one/few instances, we call it zero-shot, one-shot, few-shot learning, respectively
Model
- Encoder
  - CNNs, RNNs, Transformer
- Structure
  - Sequential, Tree, Graph
Learning (change loss definition)
- Adversarial learning
- Reinforcement learning