KoBigBird
KoBigBird copied to clipboard
๐ฆ Pretrained BigBird Model for Korean (up to 4096 tokens)
Pretrained BigBird Model for Korean
What is BigBird โข How to Use โข Pretraining โข Evaluation Result โข Docs โข Citation
ํ๊ตญ์ด | English
What is BigBird?
BigBird: Transformers for Longer Sequences์์ ์๊ฐ๋ sparse-attention ๊ธฐ๋ฐ์ ๋ชจ๋ธ๋ก, ์ผ๋ฐ์ ์ธ BERT๋ณด๋ค ๋ ๊ธด sequence๋ฅผ ๋ค๋ฃฐ ์ ์์ต๋๋ค.
๐ฆ Longer Sequence - ์ต๋ 512๊ฐ์ token์ ๋ค๋ฃฐ ์ ์๋ BERT์ 8๋ฐฐ์ธ ์ต๋ 4096๊ฐ์ token์ ๋ค๋ฃธ
โฑ๏ธ Computational Efficiency - Full attention์ด ์๋ Sparse Attention์ ์ด์ฉํ์ฌ O(n2)์์ O(n)์ผ๋ก ๊ฐ์
How to Use
- ๐ค Huggingface Hub์ ์ ๋ก๋๋ ๋ชจ๋ธ์ ๊ณง๋ฐ๋ก ์ฌ์ฉํ ์ ์์ต๋๋ค:)
- ์ผ๋ถ ์ด์๊ฐ ํด๊ฒฐ๋
transformers>=4.11.0์ฌ์ฉ์ ๊ถ์ฅํฉ๋๋ค. (MRC ์ด์ ๊ด๋ จ PR) - BigBirdTokenizer ๋์ ์
BertTokenizer๋ฅผ ์ฌ์ฉํด์ผ ํฉ๋๋ค. (AutoTokenizer์ฌ์ฉ์BertTokenizer๊ฐ ๋ก๋๋ฉ๋๋ค.) - ์์ธํ ์ฌ์ฉ๋ฒ์ BigBird Tranformers documentation์ ์ฐธ๊ณ ํด์ฃผ์ธ์.
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base") # BigBirdModel
tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base") # BertTokenizer
Pretraining
์์ธํ ๋ด์ฉ์ [Pretraining BigBird] ์ฐธ๊ณ
| Hardware | Max len | LR | Batch | Train Step | Warmup Step | |
|---|---|---|---|---|---|---|
| KoBigBird-BERT-Base | TPU v3-8 | 4096 | 1e-4 | 32 | 2M | 20k |
- ๋ชจ๋์ ๋ง๋ญ์น, ํ๊ตญ์ด ์ํค, Common Crawl, ๋ด์ค ๋ฐ์ดํฐ ๋ฑ ๋ค์ํ ๋ฐ์ดํฐ๋ก ํ์ต
ITC (Internal Transformer Construction)๋ชจ๋ธ๋ก ํ์ต (ITC vs ETC)
Evaluation Result
1. Short Sequence (<=512)
์์ธํ ๋ด์ฉ์ [Finetune on Short Sequence Dataset] ์ฐธ๊ณ
| NSMC (acc) |
KLUE-NLI (acc) |
KLUE-STS (pearsonr) |
Korquad 1.0 (em/f1) |
KLUE MRC (em/rouge-w) |
|
|---|---|---|---|---|---|
| KoELECTRA-Base-v3 | 91.13 | 86.87 | 93.14 | 85.66 / 93.94 | 59.54 / 65.64 |
| KLUE-RoBERTa-Base | 91.16 | 86.30 | 92.91 | 85.35 / 94.53 | 69.56 / 74.64 |
| KoBigBird-BERT-Base | 91.18 | 87.17 | 92.61 | 87.08 / 94.71 | 70.33 / 75.34 |
2. Long Sequence (>=1024)
์์ธํ ๋ด์ฉ์ [Finetune on Long Sequence Dataset] ์ฐธ๊ณ
| TyDi QA (em/f1) |
Korquad 2.1 (em/f1) |
Fake News (f1) |
Modu Sentiment (f1-macro) |
|
|---|---|---|---|---|
| KLUE-RoBERTa-Base | 76.80 / 78.58 | 55.44 / 73.02 | 95.20 | 42.61 |
| KoBigBird-BERT-Base | 79.13 / 81.30 | 67.77 / 82.03 | 98.85 | 45.42 |
Docs
- Pretraing BigBird
- Finetune on Short Sequence Dataset
- Finetune on Long Sequence Dataset
- Download Tensorflow v1 checkpoint
- GPU Benchmark result
Citation
KoBigBird๋ฅผ ์ฌ์ฉํ์ ๋ค๋ฉด ์๋์ ๊ฐ์ด ์ธ์ฉํด์ฃผ์ธ์.
@software{jangwon_park_2021_5654154,
author = {Jangwon Park and Donggyu Kim},
title = {KoBigBird: Pretrained BigBird Model for Korean},
month = nov,
year = 2021,
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.5654154},
url = {https://doi.org/10.5281/zenodo.5654154}
}
Contributors
Acknowledgements
KoBigBird๋ Tensorflow Research Cloud (TFRC) ํ๋ก๊ทธ๋จ์ Cloud TPU ์ง์์ผ๋ก ์ ์๋์์ต๋๋ค.
๋ํ ๋ฉ์ง ๋ก๊ณ ๋ฅผ ์ ๊ณตํด์ฃผ์ Seyun Ahn๋๊ป ๊ฐ์ฌ๋ฅผ ์ ํฉ๋๋ค.