FlowKit
FlowKit copied to clipboard
Add more cell info checks
Closes #5868, closes #5869
I have:
- [x] Formatted any Python files with black
- [x] Brought the branch up to date with master
- [x] Added any relevant Github labels
- [ ] Added tests for any new additions
- [ ] Added or updated any relevant documentation
- [ ] Added an Architectural Decision Record (ADR), if appropriate
- [ ] Added an MPLv2 License Header if appropriate
- [x] Updated the Changelog
Description
Adds some of our cell info qa checks and slightly reorganises the existing checks to allow that (doesn't address #6497) by moving Cdr type checks to a subdirectory and symlinking to them from the specific type subs.
- To see the specific tasks where the Asana app for GitHub is being used, see below:
- https://app.asana.com/0/0/1203946764283648
Summary by CodeRabbit
-
New Features
- Added QA checks for missing or invalid latitude/longitude, duplicate and new cell IDs in staging.
- Introduced a general QA check to count rows in staging tables.
- Added duplicate record detection and MSISDN counting for calls, SMS, MDS, and topups datasets.
- Enabled specifying additional QA checks for staging and extract stages in the pipeline.
-
Refactor
- Organised QA checks by ETL stage (extract, staging, final) with updated task IDs including data type and stage suffixes.
- Simplified topups duplicate counting queries by removing conditional logic.
-
Tests
- Parameterised tests to cover multiple QA check stages and updated integration tests to use stage-specific templates.
- Added tests for path disambiguation logic in QA check discovery.
-
Chores
- Expanded packaging and Docker build contexts to include all QA check files.
- Updated Dockerfiles to reference the final QA checks directory.