asdc
asdc copied to clipboard
Accommodation Search Dialog Corpus (宿泊施設探索対話コーパス)
Accommodation Search Dialog Corpus (in Japanese)
Main part: data/main
The main part of this corpus consists of 210 Japanese dialogs between two people acting as a customer and an operator in a fictitious accommodation consultation service by using Slack. In a dialog, the customer informed the operator of their situation and needs. Then based on the information, the operator conducted a search to meet the customer's request. The dialog was finished once the operator judged that the requirements were specific enough to narrow appropriate accommodations. Dialogs are in two formats.
- Text:
data/main/dialog/text/*.tsv - JSON:
data/main/dialog/json/*.json
Please read documents for more details.
Annotations
| Name | Doc | Data |
|---|---|---|
| SCUD | Doc | data/main/scud_example/main.Example.jsonl, data/main/scud |
| Dialog act | Doc | data/main/dialog_act |
| Request spans | Doc | data/main/request_span |
The number of SCUDs is about 3,500.
| Name | Utterance | SCUD | DA | RS |
|---|---|---|---|---|
| Agent | さようでございますか。 | |||
| それでは、駐車場を無料でご利用できるホテルをお探しします。 | ||||
| 立地ですが、観光地をまわりやすい場所はいかがでしょうか? | ||||
| User | はい、観光地をまわりやすい場所にあるといいですね。 | ホテルが観光地をまわりやすい場所にあると良い。 | はい | |
| ただ1番の目的は出雲大社なので、そこまでアクセスがよければ助かります。 | 【customer】の1番の目的が出雲大社だ。 出雲大社までアクセスが良いホテルだと良い。 |
要求 | 出雲大社=>立地 アクセスがよければ=>立地 |
Supplemental SCUD part: data/supplemental/scud: 57,447 examples
Files in data/supplemental/scud are Supplemental fictitious dialogs with SCUD annotations.
Please read the documents for more details.
- Most dialogs consist of a single pair of an agent utterance and a user utterance.
- Dialogs are stored in files in
data/supplemental/utterances: 51,390 dialogs
Supplemental correctness-labeled SCUD part: data/supplemental/correctness_labeled_scud: 8,115 examples
Files in data/supplemental/correctness_labeled_scud are Supplemental fictitious dialogs with SCUD and its correctness annotations.
If the value correct of an example is false, the example has incorrect SCUDs.
Vanilla part: data/vanilla: 74,799 dialogs
Files in data/vanilla are fictitious dialogs or queries made by crowd workers with no SCUD annotations.
Please read the documents for more details.
| Utterance 1 | Utterance 2 |
|---|---|
| あなたが、高級ホテルに泊まるとしたらどのようなホテルに泊まりたいですか? | 食事と景色が美しく、バラ風呂などの工夫があるホテル |
| あなたが、1週間の国内旅行ができることになったら、どのような旅行をしたいですか? | ゆっくり読書をたのしむ旅行 |
References
Dialog collection and SCUDs
- Yuta Hayashibe. Self-Contained Utterance Description Corpus for Japanese Dialog. Proc of LREC, pp.1249-1255. (LREC 2022) [PDF]
- 林部祐太. 要約付き宿検索対話コーパス. 言語処理学会第27回年次大会論文集,pp.340-344. 2021. (NLP 2021) [PDF]
Dialog acts and request spans
- Hongjie Shi. A Span Extraction Approach for Dialog State Tracking: A Case Study in Hotel Booking Application. 言語処理学会第27回年次大会論文集,pp.1593-1598. 2021. (NLP 2021) [PDF]
- Hongjie Shi. A Sequence-to-sequence Approach for Numerical Slot-filling Dialog Systems. Proc of SIGdial, pp.272-277. 2020. (SIGdial 2020) [PDF]
License
- Corpus, annotations and documents are licensed under Creative Commons Attribution 4.0 International License
- Programs are licensed under Apache License, Version 2.0
