haystack-tutorials icon indicating copy to clipboard operation
haystack-tutorials copied to clipboard

Move tutorial datasets to new S3 bucket

Open julian-risch opened this issue 3 years ago • 0 comments
trafficstars

With the new S3 bucket https://core-engineering.s3.eu-central-1.amazonaws.com/public/ and its public folder, we should move and possibly also rename all datasets used in the tutorials.

There are individual copies of some datasets for each tutorial to facilitate telemetry. We need to decide on a naming scheme. I would be okay with a number as a suffix just like we did until now but maybe we can come up with an alternative? The downside of the number is that it might stay in sync with the order of the tutorials on our website and the separation into beginner/intermediate/advanced tutorials.

This is how it's currently done: https://github.com/deepset-ai/haystack/blob/ddeaf2c98c157af1e26c637bcb563c6ea52fdcb7/haystack/telemetry.py#L187 "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip": "1",

What do you think? @brandenchan @bilgeyucel

Changes are needed in Haystack to make sure telemetry continues working. There is an issue for that in Haystack here: https://github.com/deepset-ai/haystack/issues/3634

julian-risch avatar Nov 25 '22 13:11 julian-risch