datapackage-pipelines icon indicating copy to clipboard operation
datapackage-pipelines copied to clipboard

Add load parameter to capture skipped rows metadata

Open cschloer opened this issue 5 years ago • 1 comments

Hey,

I understand the point of this feature being outside of the scope of tabulator (https://github.com/frictionlessdata/tabulator-py/issues/331). I think it would be an important feature to implement in load. As proposed above:

It takes in a list of dicts, each dict containing a regular expression string with once captured group, and one string that contains a column name. The regular expression is then compared to each skipped row in the data. A new column is created with the column_name as its name and the value in the capture group as its value.

If you don't think this would be useful for the general DPP/dataflows community, let me know and I can implement it in our own custom load processor.

@roll @akariv

cschloer avatar Jun 16 '20 11:06 cschloer

@cschloer Let's discuss on Monday what's the best place we can put it in (PR to dataflows/custom/etc)

roll avatar Jun 21 '20 08:06 roll