Expectation for data format in seq2seq_attention_copy

Open shwetha-97 opened this issue 2 years ago • 0 comments

In the README for the seq2seq_attention_copy method, I was unable to understand what is the difference between the data in the folders data/datasets/data and data/datasets/data_radn_split

It is mentioned that we have to put the original data in these folders.

It seems to me that the folders data and data_randn_split have different data, else the experiments in attn_copying_tune_data_radn_split.yaml and attn_copying_tune_data.yaml would be equivalent. But how are they different? Is the original data in the spider dataset being split randomly into these 2 folders? If so, in what ratio should the split be - 50:50 or some other ratio?

As I understand from here, should the folders data and data_randn_split have their own train, dev and test json? What is the reason for having these 2 folders or 2 different kinds of data?

Dec 03 '23 00:12 shwetha-97