specs icon indicating copy to clipboard operation
specs copied to clipboard

Support ignoreRows for TabularResource

Open roll opened this issue 8 years ago • 4 comments

Overview

Resource specification is created to describe concrete data source with metadata. When we deal with concrete real world data sources there could be some corner case like commented rows or blank rows on top etc. A publisher needs an ability to share this information with implementations.

Example

https://github.com/frictionlessdata/ADB-User-Study/blob/master/metadata.tsv

It's a valid resource (checked by goodtables) except row 2 and 3 which are comments and can't be removed because it's vital metadata for this publisher tools.

Proposal

Introduce ignoreRows (or skipRows or informationalRows or ?) attribute for TabularResource specification. This attribute MUST be an array of integers and strings where:

  • numbers mean row number to ignore the row
  • strings mean row first characters to match to ignore the row

Example

ignoreRows = [1, 2, "#","//"]

Related

Headers is another example where publisher could be in need of more granular control over data source rows - https://github.com/frictionlessdata/specs/issues/326

References

  • initial discussion - https://github.com/frictionlessdata/goodtables.io/issues/75

roll avatar Dec 20 '16 11:12 roll