ideas icon indicating copy to clipboard operation
ideas copied to clipboard

Version control for datasets

Open hayley-leblanc opened this issue 6 years ago • 2 comments

Is there any work being done on some kind of version control for datasets? I know that there's an extension that provides some of this functionality, and a way to view differences between metadata of different versions was added to the CKAN core a few months ago by @davidread. However, there isn't currently a way to view old versions of resources, or to compare different versions of resource files, or to revert to old versions of datasets, all of which could be really useful.

hayley-leblanc avatar Jul 22 '19 06:07 hayley-leblanc

Storing old copies of the CSV files (and other formats) would be nice.

ckanext-archiver goes some way towards this, regularly downloading from the resource URL, but the saved copies are not indexed and made available. So this might be a reasonable place to start if you're going to work on it.

davidread avatar Sep 06 '19 14:09 davidread

We are working on version control @datopian for 2 major clients right now. Code is currently at POC stage but will of course be open source. It would be awesome to talk about your use cases @hayley-leblanc There are lots of details across what we might refer to as "versioning" and also what we might refer to as "revisioning", and the types of interactions users want to have with versioned datasets.

pwalsh avatar Sep 07 '19 19:09 pwalsh