snowplow-rdb-loader icon indicating copy to clipboard operation
snowplow-rdb-loader copied to clipboard

RDB Shredder: consider disabling validation against JSON Schema

Open chuwy opened this issue 6 years ago • 5 comments

From my experience, enriched data assumes that raw data was not just enriched, but also validated - we never add invalid contexts/unstruct events to final enriched event.

Yet validation is quite compute-heavy process (not compared to distributed IO, but still). So in the end we're wasting our resources doing double-validating on already valid data.

Question here to someone who aware of bad rows. What is the most-common type of errors in shredded/bad bucket.

E.g. Snowflake Transformer does not do any validation and seems quite happy with that.

chuwy avatar Mar 14 '18 15:03 chuwy

we never add invalid contexts/unstruct events to final enriched event.

I'm not sure that's true. @BenFradet can confirm if the validation of the overall event including all contexts is the last thing that happens before writing it out.

alexanderdean avatar Mar 14 '18 22:03 alexanderdean

Custom contexts and unstruct event are validated due enrichment.

Derived contexts are indeed not validated. Though for me it sounds very reasonable to validate derived contexts as well and always have a guarantee that shred job receives valid enriched data.

chuwy avatar Mar 15 '18 08:03 chuwy

Isn't removing validation introducing coupling between the two?

Also, somewhat unfortunately, I don't think bypassing validation will save us a lot of time / resources.

BenFradet avatar Mar 15 '18 10:03 BenFradet

Isn't removing validation introducing coupling between the two?

To be honest this kind of coupling is one of the goals I had in mind. I would like to add more meaning to "enriched" state of event. E.g. "enriched" means event is in canonical state, ready for loading/processing and hence it is fully valid and there's no way for validation-related error to appear during shredding/transformation.

chuwy avatar Mar 15 '18 12:03 chuwy

I would like to add more meaning to "enriched" state of event

Yes, I understand the intent, it makes sense to me. Not sure how it maps onto this ticket in the short- to mid-term.

alexanderdean avatar Mar 15 '18 12:03 alexanderdean