tf-nlp-blocks
tf-nlp-blocks copied to clipboard
Some frequently used NLP blocks I implemented
tf-nlp-blocks
Author: Han Xiao https://hanxiao.github.io
A collection of frequently-used deep learning blocks I have implemented in Tensorflow. It covers the core tasks in NLP such as embedding, encoding, matching and pooling. All implementations follow a modularized design pattern which I called the "block-design". More details can be found in my blog post.
- Requirements
-
Contents
-
encode_blocks.py
-
match_blocks.py
-
pool_blocks.py
-
embed_blocks.py
-
mulitask_blocks.py
-
nn.py
-
- Run
Requirements
- Python >= 3.6
- Tensorflow >= 1.6
Contents
encode_blocks.py
A collection of sequence encoding blocks. Input is a sequence with shape of [B, L, D]
, output is another sequence in [B, L, D']
, where B
is batch size, L
is the length of the sequence and D
and D'
are the dimensions.
Name | Dependencies | Description | Reference |
---|---|---|---|
LSTM_encode |
a fast multi-layer bidirectional LSTM implementation based on CudnnLSTM . Expect to be 5~10x faster than the standard tf LSTMCell . However, it can only run on GPU. |
Tensorflow doc on CudnnLSTM |
|
TCN_encode |
Res_DualCNN_encode |
a temporal convolution network described in the paper, basically a multi-layer dilated CNN with special padding to ensure the causality | An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling |
Res_DualCNN_encode |
CNN_encode |
a sub-block used by TCN_encode . It is a two-layer CNN with spatial dropout in-between, then followed by a residual connection and a layer-norm. |
An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling |
CNN_encode |
a standard conv1d implementation on L axis, with the possibility to set different paddings |
Convolutional Neural Networks for Sentence Classification |
match_blocks.py
A collection of sequence matching blocks, aka. attention. Input are two sequnces: context
in the shape of [B, L_c, D]
, and query
in the shape of [B, L_q, D]
. The output is a sequence has the same length as context
, i.e. with shape of [B, L_c, D]
. Each position in the output should encodes the relevance of that position in context
to the complete query
.
Name | Dependencies | Description | Reference |
---|---|---|---|
Attentive_match |
basic attention mechanism with different scoring functions, also supports future blinding. | additive : Neural machine translation by jointly learning to align and translate; scaled : Attention is all you need |
|
Transformer_match |
a multi-head attention block from "Attention is all you need" | Attention is all you need | |
AttentiveCNN_match |
Attentive_match |
the light version of attentive convolution, with the possibility of future blinding to ensure causality. | Attentive Convolution |
BiDaf_match |
attention flow layer used in bidaf model. | Bidirectional Attention Flow for Machine Comprehension |
pool_blocks.py
A collection of pooling blocks. It fuses/reduces on the time axis L
. Input is a sequence with shape of [B, L, D]
, output is in [B, D]
.
Name | Dependencies | Description | Reference |
---|---|---|---|
SWEM_pool |
do pooling on the input sequence, supports max/avg. pooling, hierarchical avg. max pooling. | Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms |
There are also some convolution-based pooling blocks build on SWEM_pool
, but they are for experimental purpose. Thus, I will not list them here.
embed_blocks.py
A collection of positional encoding on the sequence.
Name | Dependencies | Description | Reference |
---|---|---|---|
SinusPositional_embed |
generate a sinusoid signal that has the same length of the input sequence | Attention is all you need | |
Positional_embed |
parameterize the absolute position of the tokens in the input sequence | A Convolutional Encoder Model for Neural Machine Translation |
mulitask_blocks.py
A collection of multi-task learning blocks. So far only the "cross-stitch block" is available.
Name | Dependencies | Description | Reference |
---|---|---|---|
CrossStitch |
a cross-stitch block, modeling the correlation & self-correlation of two tasks | Cross-stitch Networks for Multi-task Learning | |
Stack_CrossStitch |
CrossStitch |
stacking multiple cross-stitch blocks together with shared/separated input | Cross-stitch Networks for Multi-task Learning |
nn.py
A collection of auxiliary functions, e.g. masking, normalizing, slicing.
Run
Run app.py
for a simple test on toy data.