discopy icon indicating copy to clipboard operation
discopy copied to clipboard

Convert PDTB2 Data to JSON Format for Discourse Parser Training

Open muhammed-saeed opened this issue 5 months ago • 1 comments

Title:

Convert PDTB2 Data to JSON Format for Discourse Parser Training

Body:

Issue Description: Hello Rknaebel,

I am working on adapting a discourse parser to work with the Penn Discourse Treebank version 2 (PDTB2) dataset and require assistance in converting the PDTB data into the specific JSON format used by your discourse parser.

Specific Needs:

  1. Conversion of PDTB2 Data: I have the PDTB2 dataset in CSV format (pdtb2.csv) as well as the WSJ texts and golden files. I need to convert these into en.train, en.dev, en.test, parses.json and relations.json files. Could you provide guidance on the conversion process? Moreover which splits have you used in train, dev, test

  2. Format Specifications: What are the specific format requirements for each of these files? For example, what should be the structure and headings in the relations.json file?

  3. Example Code or Scripts: If you have any example scripts or code snippets that could aid in this conversion process, it would be greatly beneficial.

Attempted Solutions:

However, I am still facing challenges in adapting these resources to the specific needs of the PDTB2 dataset.

Any Assistance Would Be Highly Appreciated: Your expertise in this area would be immensely helpful for correctly formatting the PDTB2 data for use with the discourse parser.

Thank you for your time and consideration.

Best regards,

muhammed-saeed avatar Jan 24 '24 02:01 muhammed-saeed