discofuse icon indicating copy to clipboard operation
discofuse copied to clipboard

Right way to use discofuse dataset?

Open akesh1235 opened this issue 1 year ago • 0 comments

Click here for Dataset link Below is the following way, as per my understanding , Is it correct :question: :question:

The columns/features from DiscoFuse dataset that will be the input to the encoder and decoder are:

Click here for Dataset link

  1. coherent_first_sentence

  2. coherent_second_sentence

  3. incoherent_first_sentence

  4. incoherent_second_sentence

Click here for Dataset link

The encoder will take these four columns as input and encode them into a sequence of hidden states. The decoder will then take these hidden states as input and decode them into a new sentence that fuses the two original sentences together.

The discourse type, connective_string, has_coref_type_pronoun, and has_coref_type_nominal columns will not be used as input to the encoder or decoder. These columns are used to provide additional information about the dataset, but they are not necessary for the task of sentence fusion.

Please correct me if I am wrong; otherwise, if this understanding is right, how shall I implement this task practically?

akesh1235 avatar Jun 14 '23 08:06 akesh1235