specs
specs copied to clipboard
Support ignoreRows for TabularResource
Overview
Resource
specification is created to describe concrete data source with metadata. When we deal with concrete real world data sources there could be some corner case like commented rows or blank rows on top etc. A publisher needs an ability to share this information with implementations.
Example
https://github.com/frictionlessdata/ADB-User-Study/blob/master/metadata.tsv
It's a valid resource (checked by goodtables
) except row 2 and 3 which are comments and can't be removed because it's vital metadata for this publisher tools.
Proposal
Introduce ignoreRows
(or skipRows
or informationalRows
or ?) attribute for TabularResource
specification. This attribute MUST be an array of integers and strings where:
- numbers mean row number to ignore the row
- strings mean row first characters to match to ignore the row
Example
ignoreRows = [1, 2, "#","//"]
Related
Headers is another example where publisher could be in need of more granular control over data source rows - https://github.com/frictionlessdata/specs/issues/326
References
- initial discussion - https://github.com/frictionlessdata/goodtables.io/issues/75