awesome-data Updating data package lists

I'd like to generate a bit of discussion on this before making any changes.

Should core datasets be present in both core-list.txt and catalog-list.txt?

Currently, 33 of the 35 repos in the core list are also found in catalog-list.txt. I see that deduplication is performed when displaying the lists on data.okfn.org, but perhaps the two lists should be mutually-exclusive? If so, I could update the scraper to avoid outputting known core datasets.

Are all non-example datasets in the datasets organization considered core datasets?

I did a quick check and datasets/employment-us and datasets/browser-stats are both absent from the core list, and seem useful and high-quality.

Should catalog-list.txt be curated?

There are quite a few valid data packages that are examples or tests (e.g. datasets/ex-geojson). Should catalog-list.txt contain all valid data packages, or aim to only include useful data?

May 09 '15 18:05 Deiz

@Deiz great questions:

i think you are right that the duping of core and catalog was annoying and i think that is now fixed
ultimately i think datasets in this org should be core - probably us employment and browser stats could be core (and if not we can move out)
catalog-list atm should probably just be everything including examples ...

May 30 '15 08:05 rufuspollock

I filed separately #130 and #129 since we need two maintainers for datasets/employment-us and datasets/browser-stats. datasets/ex-geojson was removed from the core list. As such I think this can be closed but want to give a chance to bump this thread for a couple more days.

Dec 01 '15 15:12 pdehaye

I found that core-list is now in https://github.com/datasets/core-datasets but couldn't find where catalog-list is now

I wonder if they are considered as DataPackage catalog https://github.com/frictionlessdata/specs/issues/37#issuecomment-552936484

Nov 19 '20 06:11 s-celles

awesome-data awesome-data copied to clipboard

Updating data package lists

Should core datasets be present in both core-list.txt and catalog-list.txt?

Are all non-example datasets in the datasets organization considered core datasets?

Should catalog-list.txt be curated?

awesome-data
awesome-data copied to clipboard