datahub
datahub copied to clipboard
Not recognizing project as a dataset project if add datapackage after initial project create
Bug description
[!warning] 🚩 2024-06-20 I've updated the repo names for the broken and working projects so working project is at the nice url. Broken is now ttps://datahub.io/@rufuspollock/jaan-tallinn-donations-broken. I've updated text below.
i've just published https://datahub.io/@rufuspollock/jaan-tallinn-donations-broken from https://github.com/rufuspollock/jaan-tallinn-donations - it should show up as a dataset project given i have a datapackage.yaml but that is not happening. Here's the result.
I've now just created the site again new and it works 🎉 https://datahub.io/@rufuspollock/jaan-tallinn-donations
Debugging
OK, so i think the source here are the steps by which i created the repo. Steps were something like:
- Create README with frontmatter - https://github.com/rufuspollock/jaan-tallinn-donations/commit/d78ccfef00a558696c899b412701d2f3a2f3767e
- Create site on DataHub Cloud ❌ did not work
- ❓ Why not - shouldn't having frontmatter "just work" for being a dataset?
- Move frontmatter out to
datapackage.yaml - Try publishing again (auto-publishing in fact) ❌ still not working
So my guess here is that when it got initially published with just README and frontmatter it wasn't "seen" as a dataset project. And then even when the datapackage.yaml were added it didn't change the "type" of the project in the database (which is a 🐛)
Thoughts
- we an add a nice new unit test for this in our ingest/processing code
- Would be great to have a simple explicit way to designate the type of a project. Maybe a
typefrontmatter field in the main README. Can still have the magic inference too.typemay be a bit too generic (it gets used for everything). MaybeprojectTypeordatahubType. That saidtypeis simple and memorable!
- ❓ completely remove the "magic" and require explicit
typesetting- issue with this is that it is not compatible (and backwards compatible) with simple Frictionless datasets