adding command line tool for dumping all metadata
This address issue #98
dump datasets is stable because its output is always ordered by package id, does this command have a stable ordering? If not, could you look at adding one? This should also help if concurrent updates, deletes, creations are happening
The default sort order is 'relevance asc, metadata_modified desc', so a sort needs to be passed into the package_search call.
package_id asc would be nice, then we can easily compare the output from dump datasets
It looks like the metadata id field is called id instead package_id.
This is excellent work.
Maybe calling it 'dump_datasets2' is a bit more specific than 'dump_metadata'?
yes, sorry I've been slow in merging this. I like @davidread 's command-name suggestion. dump_datasets2 is better. We should document why you might want to use this command too (accessing sites like data.gov, because it's X% faster, etc..)
Or even better: let's call this command search datasets and allow the parameters that are allowed to the package_search call to be provided (like you can with ckanapi action package_search ...) that makes this command much more useful and doesn't require strange naming or explanation (like "because data.gov...")
Yes that would be even better, although perhaps we've messed the author around enough!
I guess dump_dataset2 is better. I am not trying to add too many functionalities here. If you call it search dataset it still overlaps withpackage_search, and more confusing. Maybe it's better to reserve search ... for non-filtering based search such as key-word search.
@ekzhu no worries, I'll finish this off if you're not interested in making my suggested change.
dump is different from package_search, as dump can download resources too.