datahub icon indicating copy to clipboard operation
datahub copied to clipboard

Not recognizing project as a dataset project if add datapackage after initial project create

Open rufuspollock opened this issue 1 year ago • 2 comments

Bug description

[!warning] 🚩 2024-06-20 I've updated the repo names for the broken and working projects so working project is at the nice url. Broken is now ttps://datahub.io/@rufuspollock/jaan-tallinn-donations-broken. I've updated text below.

i've just published https://datahub.io/@rufuspollock/jaan-tallinn-donations-broken from https://github.com/rufuspollock/jaan-tallinn-donations - it should show up as a dataset project given i have a datapackage.yaml but that is not happening. Here's the result.

image

I've now just created the site again new and it works 🎉 https://datahub.io/@rufuspollock/jaan-tallinn-donations

image

Debugging

OK, so i think the source here are the steps by which i created the repo. Steps were something like:

  • Create README with frontmatter - https://github.com/rufuspollock/jaan-tallinn-donations/commit/d78ccfef00a558696c899b412701d2f3a2f3767e
  • Create site on DataHub Cloud ❌ did not work
    • ❓ Why not - shouldn't having frontmatter "just work" for being a dataset?
  • Move frontmatter out to datapackage.yaml
  • Try publishing again (auto-publishing in fact) ❌ still not working

So my guess here is that when it got initially published with just README and frontmatter it wasn't "seen" as a dataset project. And then even when the datapackage.yaml were added it didn't change the "type" of the project in the database (which is a 🐛)

Thoughts

  • we an add a nice new unit test for this in our ingest/processing code
  • Would be great to have a simple explicit way to designate the type of a project. Maybe a type frontmatter field in the main README. Can still have the magic inference too.
    • type may be a bit too generic (it gets used for everything). Maybe projectType or datahubType. That said type is simple and memorable!
  • ❓ completely remove the "magic" and require explicit type setting
    • issue with this is that it is not compatible (and backwards compatible) with simple Frictionless datasets

rufuspollock avatar Jun 20 '24 09:06 rufuspollock