pdr-backend
pdr-backend copied to clipboard
[Lake][DuckDB] ETL - Implement Update Queries
trafficstars
Motivation
We have now verified that the basic lake functionality is working as expected.
We now want to verify the data quality and completeness.
This means that additional SQL queries are being run, such that more tables are being processed and richer data is being generated.
- This means that more SQL queries are being run in the ETL step
- This means that slot and other tables are being processed
- This means that tables like pdr_payouts cause bronze_predictions to be updated
- This means that null entries inside bronze_predictions are eventually updated
Update Step - Incrementally updating the Lake
When you run the "lake update" command, later SQL queries are responsible for updating w/ the most recent information.
- When the lake updates, new records have arrived that need to be processed
- These new records (such as pdr_payout) if applicable should be: (a) cleaned up into their raw/bronze table, (b) update other tables to reflect this event arriving
- After all records have been yielded to temp tables and the pipeline ends, records should then be available on live/production tables.
Data Workflows All data workflows should operate in the same way.
- All data that needs to be written out, is first written into a temp table.
- As temp tables are created w/ new data, views are available so that downstream queries can access both old and new data from a single query.
- Once all the processes have completed and data is written out to temp tables, we can do a final merge/update rows into final/live/production tables.
DoD:
- [ ] Tables like truevals and payouts are being processed
- [ ] Bronze prediction is being updated as a result of truevals and payouts being processed
- [ ] Other tables and bronze tables are currently not processed
Task:
- [ ] Process new pdr-payouts into duckdb
- [ ] Process new pdr-truevals into duckdb
- [ ] Verify the incremental update step works #982
- [ ] Create SQL that process new pdr-payouts into update bronze-predictions
- [ ] Create SQL that process new pdr-truevals into update bronze-predictions
- [ ] Verify that null records inside bronze-predictions are being updated correctly
- [ ] Verify everything is working e2e