website-old icon indicating copy to clipboard operation
website-old copied to clipboard

New Section: How Do Data Packages Compare To...

Open danfowler opened this issue 9 years ago • 6 comments

A typical question when explaining the concept of Data Packages to people who already have some familiarity with "containerization", "metadata", and schema languages is how Data Packages compare, especially when considering the work involved in supporting an additional standard.

A comparison table listing the features supported by different formats seems like a good place to point people. In addition, it could make clear potential ways in which two standards can complement each other.

Potential comparisons to make:

  • CSVW: https://www.w3.org/2013/csvw/wiki/Main_Page
  • HDF5/netCDF4: https://www.hdfgroup.org/
  • csv-schema: http://digital-preservation.github.io/csv-schema/csv-schema-1.0.html
  • OAI-ORE: https://www.openarchives.org/ore/
  • Research Objects: http://www.researchobject.org/
  • BagIt: https://en.wikipedia.org/wiki/BagIt
  • SQL: https://en.wikipedia.org/wiki/SQL
  • XML: https://www.w3.org/XML/
  • Serialization formats: Pickle (https://docs.python.org/3/library/pickle.html), Feather (https://blog.rstudio.org/2016/03/29/feather/)

Related: https://github.com/frictionlessdata/project/issues/274

danfowler avatar Aug 08 '16 17:08 danfowler

@sje30 hi Stephen, this particular task I've set for myself was inspired by your comments on HDF5. Do you think there's anything worth adding here?

danfowler avatar Aug 08 '16 19:08 danfowler

hi @danfowler this comparison list looks good, but the problem I find with comparison tables is that they tend to be seen (by me at least) as not objective unless done by an independent person/group.

What might also work is the "killer features" that you can highlight to convince people why to switch to yet anotherr technology.

(on a minor topic, are things like sql/xml worth comparing too?)

sje30 avatar Aug 08 '16 20:08 sje30

Thanks @sje30 yeah, I understand. I thinking less about marketing and more in terms of actually explaining the thing. For instance, I'll be talking to people who know HDF intimately (https://github.com/frictionlessdata/project/issues/290) soon and it would be good to the best way to describe the way these formats relate.

are things like sql/xml worth comparing too?

I've added to these to the issue. Thanks!

danfowler avatar Aug 17 '16 11:08 danfowler

Related: https://discuss.okfn.org/t/w3c-csv-for-the-web-how-does-it-relate-to-data-packages/1715

danfowler avatar Aug 26 '16 16:08 danfowler

Another data format to think about - its quite new, but mentions one thing that is a concern, performance. https://blog.rstudio.org/2016/03/29/feather/

sje30 avatar Sep 03 '16 15:09 sje30

Thanks @sje30 I have included it above among the serialization formats!

danfowler avatar Sep 04 '16 02:09 danfowler