datadex icon indicating copy to clipboard operation
datadex copied to clipboard

Publish Static Datasets

Open davidgasquez opened this issue 3 years ago • 8 comments

We should publish datasets in multiple places

davidgasquez avatar Mar 14 '23 09:03 davidgasquez

Also, publish via RoAPI.

davidgasquez avatar Jun 23 '23 10:06 davidgasquez

Also, generate a Frictionless package (with Dagster) for the final datasets parquet files.

davidgasquez avatar Sep 08 '23 10:09 davidgasquez

Would be nice to expose an static data api (url.com/dataset/partition/data.json) and perhaps some custom graphs at url.com/dataset/partition/?

davidgasquez avatar Sep 15 '23 08:09 davidgasquez

Also, publish on GitHub artifacts. Pypi does something like this for some of their datasets which then surfaces via a Next.js app.

davidgasquez avatar Oct 16 '23 14:10 davidgasquez

Wow, I didn't know RoAPI, awesome!

+1 for parquet files.

I would wait duckdb become at least 1.0 to use it as a file format.

fredguth avatar Feb 27 '24 21:02 fredguth

I think the DuckDB database could be pushed to Huggingface too!

https://huggingface.co/docs/huggingface_hub/en/guides/upload#upload-a-file

davidgasquez avatar Apr 08 '24 07:04 davidgasquez

Maybe it is best to way to the first release version of duckdb. I head it will be soon. Meanwhile, I would upload a parquet.

fredguth avatar Apr 15 '24 20:04 fredguth