clutrr icon indicating copy to clipboard operation
clutrr copied to clipboard

Releasing v1.3 data

Open veronica320 opened this issue 1 year ago • 3 comments

Hi authors, thanks for creating this great dataset! Would it be possible to share the "GPT3 cleaned data: CLUTRR v1.3" as mentioned in this blog post? This will save a lot of time for users to generate the data themselves and enable fair comparison of different methods on the same data. Thanks!

veronica320 avatar Aug 03 '23 22:08 veronica320

Hi, the v1.3 generation tools are provided in the develop branch. We currently do not explicitly provide generated data, but the tools to generate it. However, if you require a fixed dataset for reproducibility, I could perhaps re-use our Huggingface dataset location and create a new dataset v1.3 in the org. However, since CLUTRR supports many configurations, it would be good if you can let me know in this issue which ones you would prefer first, and I'd be happy to generate them.

koustuvsinha avatar Aug 03 '23 22:08 koustuvsinha

Thanks for the quick reply! Would it be possible for you to share the cleaned version for the following configuration from your google drive?

  • data_089907f8 - Train: k=2,3, Test: k=2,3,4,5,6,7,8,9,10

veronica320 avatar Aug 03 '23 23:08 veronica320

I'm a bit busy with work, but I'll try to share the new data through Huggingface next week!

Generating the data following the instructions in develop branch should be a good starting point if you need the data immediately.

koustuvsinha avatar Aug 04 '23 20:08 koustuvsinha