ludwig
ludwig copied to clipboard
Added ATIS to Ludwig datasets
Unit Test Results
8 files ±0 8 suites ±0 1h 30m 0s :stopwatch: - 1m 1s 2 168 tests ±0 2 138 :heavy_check_mark: ±0 30 :zzz: ±0 0 :x: ±0 8 672 runs ±0 8 552 :heavy_check_mark: ±0 120 :zzz: ±0 0 :x: ±0
Results for commit f738262f. ± Comparison against base commit 4a92ac3f.
:recycle: This comment has been updated with latest results.
There is an issue here. Despite the fact that that Kaggle dataset is cited to be ATIS, it actually is not ATIS, and there's also another dataset cited to be ATIS that is not ATIS coming from the CNTK repo. All these datasets only retain the intent part of the original ATIS dataset (ATIS2 to be precise, as there's an ATIS0 and ATIS3).
Tried to search around for the original ATIS dataset and it seems it's quite difficult to find the original one. The two best sources I could find are the following:
- https://github.com/yvchen/JointSLU which is the repo of a paper from Gokhan Tür (my former manager at Uber) and Dilek Hakkani-Tür
- https://github.com/D2KLab/botcycle/tree/master/nlu/data/atis/source
- https://www.kaggle.com/siddhadev/atis-dataset-clean-re-split-kernel/notebook a Kaggle notebook that actually downloads the files from the first repo
These datasets contain also the sequence of IOB tags for the slots of the semantic frames sentences are tagged with in ATIS other than the intent. We want to have the full dataset with the full annotations because with Ludwig we can easily build a joint model for NLU (intent + entity tagging).