discofuse
discofuse copied to clipboard
Right way to use discofuse dataset?
Click here for Dataset link Below is the following way, as per my understanding , Is it correct :question: :question:
The columns/features from DiscoFuse dataset
that will be the input to the encoder
and decoder
are:
-
coherent_first_sentence
-
coherent_second_sentence
-
incoherent_first_sentence
-
incoherent_second_sentence
The encoder
will take these four columns as input and encode them into a sequence of hidden states. The decoder
will then take these hidden states as input and decode them into a new sentence that fuses the two original sentences together.
The discourse type, connective_string, has_coref_type_pronoun, and has_coref_type_nominal columns will not be used as input to the encoder or decoder. These columns are used to provide additional information about the dataset, but they are not necessary for the task of sentence fusion.
Please correct me if I am wrong; otherwise, if this understanding is right, how shall I implement this task practically?