Auto-SQL-Correction
Auto-SQL-Correction copied to clipboard
Code, data, and model of paper "Text-to-SQL Error Correction with Language Models of Code" (ACL'23)
Auto-SQL-Correction
Code, data, and model for our ACL 2023 paper Text-to-SQL Error Correction with Language Models of Code.
Note: Although the raw codes for all experiments are released, we are actively cleaning and reorganizing the repository, so some temporary issues may occur till we finalize it. Please check the following TODO list for our progress:
TODO
- [x] Data and model
- [x] Code for
CodeT5-PyDict+Program - [ ] Code for simulated interaction
- [ ] Code for other experiments
Table of Contents
- Installation
- Data
- Preprocessing
- Training
- Evaluation
- Citation
Installation
Please run the following commands to create a conda environment in Python 3.9 with the required packages.
conda create -n sqledit python=3.9 pip
conda activate sqledit
pip install -r requirements.txt
Data
Please first download the original Spider dataset from this link and unzip it in the data/ folder.
unzip spider.zip -d data/
Then, please download our synthesized SQL error correction data from this link and also put them in the data/ folder.
The data/ folder should be organized as follows:
.
├─── data
│ ├─── spider
│ ├─── ...
│ ├─── spider-dev-bridge.json
│ ├─── spider-dev-codet5.json
│ ├─── spider-dev-smbop.json
│ ├─── spider-train-bridge.json
│ ├─── spider-train-codet5.json
│ ├─── spider-train-smbop.json
│ ├─── sqledit_dev_gold.sql
│ ...
Preprocessing
TODO
python run.py --preproc --use_content --query_type pydict --edit_type program --base_parser smbop
Training
TODO
mkdir model
python run.py --train --load_checkpoint Salesforce/codet5-base --save_checkpoint model/codet5-sqledit --seed 42 --gpu 0
Evaluation
TODO
python run.py --eval --load_checkpoint model/codet5-sqledit --gpu 0
Model Checkpoints
You may download our pre-trained model checkpoints from this link. It includes our CodeT5-PyDict+Program model trained for the three text-to-SQL base parser in our paper.
Citation
@inproceedings{chen-etal-2023-sqledit,
title = "Text-to-SQL Error Correction with Language Models of Code",
author = "Chen, Ziru and
Chen, Shijie and
White, Michael and
Mooney, Raymond and
Payani, Ali and
Srinivasa, Jayanth and
Su, Yu and
Sun, Huan",
booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)",
year = "2023",
address = "Toronto, Canada",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2305.13073"
}