data_kitten
data_kitten copied to clipboard
Get dataset metadata in a consistent format - no matter what you throw at it
data_kitten
A collection of classes that represent Datasets and other concepts, modeled on DCAT
The module is designed to automatically interrogate data sources and give back data
and metadata in a consistent format. The best starting place is probably by having a look at Dataset
.
It is designed to handle data from multiple Sources
(such as git repositories, local files, remote URLs),
Hosts
(GitHub, etc), and PublishingFormats
(DataPackage, RDFa, microdata, DSPL, etc).
Currently supports Datapackages in git repositories (including but not limited to GitHub repos). Wider support will follow.
Documentation
Full YARD documentation is available on Rubydoc.info.
Licence
This code is open source under the MIT license. See the LICENSE.md file for full details.
Requirements
- Git ~> 1.2.6
Usage
Pop the gem into your Gemfile:
gem 'data_kitten', :git => "git://github.com/theodi/data_kitten.git"
Require if you need to:
require 'data_kitten'
Request a dataset:
dataset = DataKitten::Dataset.new("https://github.com/theodi/dataset-mod-disposals.git")
Use the results:
dataset.supported?
dataset.origin
dataset.host
dataset.data_title
dataset.documentation_url
dataset.release_type
dataset.time_sensitive?
dataset.publishing_format
dataset.maintainers
dataset.publishers
dataset.licenses
dataset.contributors
dataset.crowdsourced?
dataset.contributor_agreement_url
dataset.distributions
dataset.change_history
# And more to come!
See example usage in a Rails project at https://github.com/theodi/git-data-viewer