FlowKit icon indicating copy to clipboard operation
FlowKit copied to clipboard

Add more cell info checks

Open greenape opened this issue 6 months ago • 3 comments

Closes #5868, closes #5869

I have:

  • [x] Formatted any Python files with black
  • [x] Brought the branch up to date with master
  • [x] Added any relevant Github labels
  • [ ] Added tests for any new additions
  • [ ] Added or updated any relevant documentation
  • [ ] Added an Architectural Decision Record (ADR), if appropriate
  • [ ] Added an MPLv2 License Header if appropriate
  • [x] Updated the Changelog

Description

Adds some of our cell info qa checks and slightly reorganises the existing checks to allow that (doesn't address #6497) by moving Cdr type checks to a subdirectory and symlinking to them from the specific type subs.


  • To see the specific tasks where the Asana app for GitHub is being used, see below:
    • https://app.asana.com/0/0/1203946764283648

Summary by CodeRabbit

  • New Features

    • Added QA checks for missing or invalid latitude/longitude, duplicate and new cell IDs in staging.
    • Introduced a general QA check to count rows in staging tables.
    • Added duplicate record detection and MSISDN counting for calls, SMS, MDS, and topups datasets.
    • Enabled specifying additional QA checks for staging and extract stages in the pipeline.
  • Refactor

    • Organised QA checks by ETL stage (extract, staging, final) with updated task IDs including data type and stage suffixes.
    • Simplified topups duplicate counting queries by removing conditional logic.
  • Tests

    • Parameterised tests to cover multiple QA check stages and updated integration tests to use stage-specific templates.
    • Added tests for path disambiguation logic in QA check discovery.
  • Chores

    • Expanded packaging and Docker build contexts to include all QA check files.
    • Updated Dockerfiles to reference the final QA checks directory.

greenape avatar Jun 20 '25 11:06 greenape