Quarantine data in silver layer
Normally we have a requirement of applying data quality rules from bronze to silver layer, so we would need to qurantine data in silver layer, currently its not supported to add quaratine feature in silver layer. will it be available soon ?
Usually bronze is entry point where customers do quarantine data and send back to source. We can introduce quarantine feature in silver too. It might be for v0.0.10 release since v.0.0.9 release is finalized
Thanks for accepting it as an enhancement for v0.0.10. In our case, we need to keep bronze layer without applying any data quality rule and send the quality data in silver and quarantine the bad data in silver only. The purpose of quarantine is not to send the data back to source but to have detailed data quality monitoring of rows are rejected. handshake with source system will still be manual and done by respective data product owner.
If we quarantine the data in bronze layer itself then we will loose the idea of keeping As-Is source data in bronze.
Similar thought process to what Sanket mentioned on 16-Oct-24, I want to store the good as well as bad records in bronze and apply dqe in silver layer. Eagerly waiting for v0.0.10 to be finalised.
New Silver Quarantine Table Attributes Introduced in onboarding.json:
-
silver_catalog_quarantine -
silver_database_quarantine -
silver_quarantine_table -
silver_quarantine_table_properties -
silver_quarantine_cluster
Also Added:
-
expect_or_quarantinein silver_data_quality_expectations.json
✅ To run tests:
python integration_tests/run_integration_tests.py --uc_catalog_name=<<catalog_name>> --source=cloudfiles
📊 Silver Onboarding DataflowSpec for customer and transaction feeds:
🔁 Silver Layer DLT for customer and transaction tables:
@sanketkaleda @aayrm5 – Please test this feature branch and share any feedback before we merge into feature/v0.0.10 for release.