open-data-etl-utility-kit
open-data-etl-utility-kit copied to clipboard
Use Pentaho's open source data integration tool (Kettle) to create Extract-Transform-Load (ETL) processes to update a Socrata open data portal. Documentation is available at http://open-data-etl-utili...
Right now, a substantial portion of the workflow automation is based on Bash scripts (.sh). The following files need to be transitioned to be Windows supportable. Namely, the following files...
removed the quotation marks and added a '/' to the example image for paths in kettle properties.
Admittedly, it's hard to define a solid unit testing framework, but should have some tests to do some regression testing. This should be done in Windows (Appveyor) and Linux (Travis...
Need to change it from "Tom Schenk and Jonathan Levy" to "City of Chicago and contributors"
New datasets often have column widths that are too narrow or too wide. It is possible to get and change the widths, using the views API (e.g., https://data.cityofchicago.org/api/views/ydr8-5enu). This raises...
With some of our currently manual updates we have been updating the dataset description or title with date information. I have been thinking about ways this could be integrated into...
Since the documentation recommends putting Kettle folders within the kit folder structure I would suggest adding `data-integration*/` to .gitignore Maybe also add `DataSync/config.json` so that password and connection settings are...
Need to clarify that the Text File Output may need to be modified to eliminate "File", "DatasetID", "ControlFile", and "File_to_Geocode" fields that can get added when using "Add All" in...
The mechanism for excluding certain datasets with the -e parameter is fairly crude. It certainly could be improved. Maybe read from a text file listing the 4x4's to exclude?
Under Setting-up default directories, Read the Docs is somehow unhappy about the combination of the bulleted list and the backslashes. I do not know the markup language well enough to...