distance-parser
distance-parser copied to clipboard
Which version of CTB to use?
Hi,
The CTB version in the paper is 5.1 but this repo links to CTB 8.0. I'm wondering which version should be used to reproduce the experiments? Thank you.
The 8.0 and 5.1 are different versions, but we were not able to find the original 5.1 version from the web. According to [Liu and Zhang. 2017b], who is using 5.1, we found that we were able to find all the train/valid/test sections in our 8.0 version data. So we extracted the corresponding sections in 8.0 to form a 5.1 version out from it, and then discarded the remaining sections that were not used.
You can find which sections were selected and how they were split into train/valid/test splits in ctb.py
Ref: Jiangming Liu and Yue Zhang. 2017b. Shift-reduce constituent parsing with neural lookahead features. Transactions of the Association for Computational Linguistics 5:45–58.
Thanks for your reply. So 5.1 is a subset of 8.0, which can be extracted by ctb.py
?
Yes. Each newer version is a superset of older versions, sometimes with minor corrections to errors in previous versions. See https://catalog.ldc.upenn.edu/LDC2013T21
On Thu, Mar 26, 2020 at 5:03 PM Kaiyu Yang [email protected] wrote:
Thanks for your reply. So 5.1 is a subset of 8.0, which can be extracted by ctb.py?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/hantek/distance-parser/issues/3#issuecomment-604747298, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHCRDRO3LUS7NJDDCDLT7LRJPULZANCNFSM4LTQ6KAQ .
-- LIN, Zhouhan (林洲汉) Department of Computer Science and Operational Research University of Montreal
Phone: +01-514-586-0551 Email: [email protected] Homepage: https://hantek.github.io/ http://zhouhanlin.webs.com/
Thank you for the clarification!
Hi Zhouhan,
I followed the instructions to extract CTB data. I got 17,544 training examples, 352 validation examples and 348 testing examples. Is that correct? It feels to me the number of validation/testing examples are too few; so I'm wondering if something in my preprocessing has gone wrong.
Thanks!