oso icon indicating copy to clipboard operation
oso copied to clipboard

Handle type mismatch conflicts better with `dlt`

Open Jabolol opened this issue 2 months ago • 2 comments

Right now, runs will fail on a type mismatch. If the API is flaky and schemas change, it should be handled by dlt instead of breaking the pipeline.

Some extra thoughts here: https://github.com/opensource-observer/oso/issues/5186#issuecomment-3369178423

Jabolol avatar Oct 28 '25 13:10 Jabolol

OSO-1214

linear[bot] avatar Oct 28 '25 13:10 linear[bot]

Seems like the API schema changed greatly in between runs. Some fields do not have data anymore

2025-10-28 15:12:06 +0100 - dagster - INFO - __ASSET_JOB - 7155b5f1-dc6b-4d12-8156-905c70efcb94 - giveth__qf_rounds - GraphQLFactory: Completed fetching 17 total items across 1 successful pages
2025-10-28 15:12:07,418|[WARNING]|81169|8582907904|dlt|logger.py|wrapper:40|In schema `qf_rounds`: The following columns in table 'qf_rounds' did not receive any data during this load and therefore could not have their types inferred:
  - giving_blocks_id
  - change_id
  - youtube
  - co_ordinates
  - project_qf_round_relations
  - stripe_account_id
  - donations
  - reactions
  - social_media
  - anchor_contracts
  - status_history
  - project_verification_form
  - featured_update
  - project_future_power
  - project_instant_power
  - verification_form_status
  - social_profiles
  - project_estimated_matching_view
  - project_url
  - prev_status_id
  - project_update
  - project_updates
  - admin_js_base_url
  - reaction
  - campaigns
  - cause_projects
  - deposit_tx_chain_id
  - chain_id

Unless type hints are provided, these columns will not be materialized in the destination.
One way to provide type hints is to use the 'columns' argument in the '@dlt.resource' decorator.  For example:

@dlt.resource(columns={'giving_blocks_id': {'data_type': 'text'}})

2025-10-28 15:12:07 +0100 - dagster.daemon.QueuedRunCoordinatorDaemon - INFO - 1 runs are currently in progress. Maximum is 1, won't launch more.
2025-10-28 15:12:07 +0100 - dagster - DEBUG - __ASSET_JOB - 7155b5f1-dc6b-4d12-8156-905c70efcb94 - 81169 - giveth__qf_rounds - STEP_OUTPUT - Yielded output "result" of type "Any". (Type check passed).
2025-10-28 15:12:07 +0100 - dagster - DEBUG - __ASSET_JOB - 7155b5f1-dc6b-4d12-8156-905c70efcb94 - 81169 - giveth__qf_rounds - ASSET_MATERIALIZATION - Materialized value giveth qf_rounds.
2025-10-28 15:12:07 +0100 - dagster - DEBUG - __ASSET_JOB - 7155b5f1-dc6b-4d12-8156-905c70efcb94 - 81169 - giveth__qf_rounds - STEP_SUCCESS - Finished execution of step "giveth__qf_rounds" in 9m20s.

Jabolol avatar Oct 28 '25 14:10 Jabolol