discopy
discopy copied to clipboard
Convert PDTB2 Data to JSON Format for Discourse Parser Training
Title:
Convert PDTB2 Data to JSON Format for Discourse Parser Training
Body:
Issue Description: Hello Rknaebel,
I am working on adapting a discourse parser to work with the Penn Discourse Treebank version 2 (PDTB2) dataset and require assistance in converting the PDTB data into the specific JSON format used by your discourse parser.
Specific Needs:
-
Conversion of PDTB2 Data: I have the PDTB2 dataset in CSV format (
pdtb2.csv
) as well as the WSJ texts and golden files. I need to convert these intoen.train
,en.dev
,en.test
,parses.json
andrelations.json
files. Could you provide guidance on the conversion process? Moreover which splits have you used in train, dev, test -
Format Specifications: What are the specific format requirements for each of these files? For example, what should be the structure and headings in the
relations.json
file? -
Example Code or Scripts: If you have any example scripts or code snippets that could aid in this conversion process, it would be greatly beneficial.
Attempted Solutions:
- I have explored the following resources for guidance:
However, I am still facing challenges in adapting these resources to the specific needs of the PDTB2 dataset.
Any Assistance Would Be Highly Appreciated: Your expertise in this area would be immensely helpful for correctly formatting the PDTB2 data for use with the discourse parser.
Thank you for your time and consideration.
Best regards,