ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Added ATIS to Ludwig datasets

Open ShreyaR opened this issue 3 years ago • 2 comments

ShreyaR avatar Jan 19 '22 07:01 ShreyaR

Unit Test Results

       8 files  ±0         8 suites  ±0   1h 30m 0s :stopwatch: - 1m 1s 2 168 tests ±0  2 138 :heavy_check_mark: ±0    30 :zzz: ±0  0 :x: ±0  8 672 runs  ±0  8 552 :heavy_check_mark: ±0  120 :zzz: ±0  0 :x: ±0 

Results for commit f738262f. ± Comparison against base commit 4a92ac3f.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jan 19 '22 08:01 github-actions[bot]

There is an issue here. Despite the fact that that Kaggle dataset is cited to be ATIS, it actually is not ATIS, and there's also another dataset cited to be ATIS that is not ATIS coming from the CNTK repo. All these datasets only retain the intent part of the original ATIS dataset (ATIS2 to be precise, as there's an ATIS0 and ATIS3).

Tried to search around for the original ATIS dataset and it seems it's quite difficult to find the original one. The two best sources I could find are the following:

  • https://github.com/yvchen/JointSLU which is the repo of a paper from Gokhan Tür (my former manager at Uber) and Dilek Hakkani-Tür
  • https://github.com/D2KLab/botcycle/tree/master/nlu/data/atis/source
  • https://www.kaggle.com/siddhadev/atis-dataset-clean-re-split-kernel/notebook a Kaggle notebook that actually downloads the files from the first repo

These datasets contain also the sequence of IOB tags for the slots of the semantic frames sentences are tagged with in ATIS other than the intent. We want to have the full dataset with the full annotations because with Ludwig we can easily build a joint model for NLU (intent + entity tagging).

w4nderlust avatar Jan 19 '22 08:01 w4nderlust