errata
errata copied to clipboard
Define an errata in table format (CSV) and then apply it to an arbitrary source. Inspired by RFC Errata, lets you keep your own errata in a transparent way.
errata
Define an errata in table format (CSV) and then apply it to an arbitrary source. Inspired by RFC Errata, lets you keep your own errata in a transparent way.
Tested in MRI 1.8.7+, MRI 1.9.2+, and JRuby 1.6.7+. Thread safe.
Inspiration
There's a process for reporting errata on RFC:
Example
Every errata has a table structure based on the IETF RFC Editor's "How to Report Errata".
| date | name | type | section | action | x | y | condition | notes | |
|---|---|---|---|---|---|---|---|---|---|
| 2011-03-22 | Ian Hough | [email protected] | meta | Intended use | http://example.com/original-data-with-errors.xls | A hypothetical document that uses non-ISO country names | |||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /ANTIGUA & BARBUDA/ | ANTIGUA AND BARBUDA | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /BOLIVIA/ | BOLIVIA, PLURINATIONAL STATE OF | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /BOSNIA & HERZEGOVINA/ | BOSNIA AND HERZEGOVINA | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /BRITISH VIRGIN ISLANDS/ | VIRGIN ISLANDS, BRITISH | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /COTE D'IVOIRE/ | CÔTE D'IVOIRE | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /DEM\. PEOPLE'S REP\. OF KOREA/ | KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /DEM\. REP\. OF THE CONGO/ | CONGO, THE DEMOCRATIC REPUBLIC OF THE | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /HONG KONG SAR/ | HONG KONG | ||
| 2011-03-22 | Ian Hough | [email protected] | technical | Country Name | replace | /IRAN \(ISLAMIC REPUBLIC OF\)/ | IRAN, ISLAMIC REPUBLIC OF |
Which would be saved as a CSV:
date,name,email,type,section,action,x,y,condition,notes
2011-03-22,Ian Hough,[email protected],meta,Intended use,,http://example.com/original-data-with-errors.xls,,A hypothetical document that uses non-ISO country names
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/ANTIGUA & BARBUDA/,ANTIGUA AND BARBUDA,,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/BOLIVIA/,"BOLIVIA, PLURINATIONAL STATE OF",,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/BOSNIA & HERZEGOVINA/,BOSNIA AND HERZEGOVINA,,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/BRITISH VIRGIN ISLANDS/,"VIRGIN ISLANDS, BRITISH",,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/COTE D'IVOIRE/,CÔTE D'IVOIRE,,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/DEM\. PEOPLE'S REP\. OF KOREA/,"KOREA, DEMOCRATIC PEOPLE'S REPUBLIC OF",,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/DEM\. REP\. OF THE CONGO/,"CONGO, THE DEMOCRATIC REPUBLIC OF THE",,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/HONG KONG SAR/,HONG KONG,,
2011-03-22,Ian Hough,[email protected],technical,Country Name,replace,/IRAN \(ISLAMIC REPUBLIC OF\)/,"IRAN, ISLAMIC REPUBLIC OF",,
And then used
errata = Errata.new(:url => 'http://example.com/errata.csv')
original = RemoteTable.new(:url => 'http://example.com/original-data-with-errors.xls')
original.each do |row|
errata.correct! row # destructively correct each row
end
UTF-8
Assumes all input strings are UTF-8. Otherwise there can be problems with Ruby 1.9 and Regexp::FIXEDENCODING. Specifically, ASCII-8BIT regexps might be applied to UTF-8 strings (or vice-versa), resulting in Encoding::CompatibilityError.
More advanced usage
The earth library has dozens of real-life examples showing errata in action:
Real-world usage
We use errata for data science at Brighter Planet and in production at
The killer combination:
active_record_inline_schema- define table structureremote_table- download data and parse iterrata(this library!) - apply corrections in a transparent waydata_miner- import data idempotently
Authors
- Seamus Abshere [email protected]
- Andy Rossmeissl [email protected]
- Ian Hough [email protected]
Copyright
Copyright (c) 2012 Brighter Planet. See LICENSE for details.

