SPARTA
SPARTA copied to clipboard
Semantic Parsing And Relational Table Aware Model that generates SQL from question written in Korean language
SPARTA (Semantic Parsing And Relational Table Aware)
This is a term project in Unstructured Text Analysis class. We implement the deep learning model for converting Korean language to SQL query.
Team Members
- Hoonsang Yoon
- Jaehyuk Heo
- Jungwoo Choi
- Jeongseob Kim
Information
- Korea University DSBA Lab
- Advisor: Pilsung Kang
Demo
Check about Demo in here.
Video
Text2SQL Result Video
Dataset
tar xvjf data/data.tar.bz2
Korean WikiSQL dataset
unzip data/ko_token.zip
unzip data/ko_token_not_h.zip
unzip data/ko_from_table.zip
unzip data/ko_from_table_not_h.zip
Translation
We translated English question into Korean question in four ways as follows.
| No | Method | Data Name | Description |
|---|---|---|---|
| 1 | Where+Select | ko_token | Keep where values in label and column used in select clause among the words in English question |
| 2 | Where | ko_token_not_h | Keep header of table among the words in English question |
| 3 | Table+Header | ko_from_table | Keep values and header in table among the words in English question |
| 4 | Table | ko_from_table_not_h | Keep values in table among the words in English question |
Method 1 (Where+Select)
Method 2 (Where)
Method 3 (Table+Header)
Method 4 (Table)
Run translation
- Create a question dataframe to translate English to Korean.
bash run_translate.sh value
-
Translate English to Korean by using Google Tanslator (click here!) and copy a text file in ko_data directory such as 'ko_train_question.txt'
-
Insert Korean question
bash run_translate.sh token
SPARTA Model
We use pretrained multilingual BERT as encoder.
Sub Task
Seq2Seq
Evaluation
- Logical Form Accuracy
- Execution Accuracy
Experiments
| Model | Task | Test Logical Form Accuracy(%) |
Test Execution Accuracy(%) |
|---|---|---|---|
| SQLova | Subtask | 65.8 | 74.3 |
| HydraNet | Subtask | 40.4 | 40.7 |
| Bridge | Generation | 54.6 | 62.1 |
Download Trained Models
| Method | SQlova | Bridge |
|---|---|---|
| Where+Select | Download | - |
| Where | Download | - |
| Table+Header | Download | - |
| Table | Download | - |
Presentation
Proposal
Interim Findings
Final
Reference
- [1] Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning.
- [2] Hwang, W., Yim, J., Park, S., & Seo, M. (2019). A comprehensive exploration on wikisql with table-aware word contextualization. KR2ML Workship at NeurIPS 2019
- [3] Lyu, Q., Chakrabarti, K., Hathi, S., Kundu, S., Zhang, J., & Chen, Z. (2020). Hybrid ranking network for text-to-sql. arXiv preprint arXiv:2008.04759.
- [4] Xi Victoria Lin, Richard Socher and Caiming Xiong. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. Findings of EMNLP 2020.



