bdit_data-sources
bdit_data-sources copied to clipboard
Miovision: Should we filter unacceptable_gaps from volumes_15min_mvt via view instead of during insert?
Pros:
- aggregate tables and unacceptable_gaps could not get out of sync! (#868)
- enables the 15min look behind over midnight in find_gaps which is otherwise problematic to reflect in downstream tables
- faster + less complex volumes_15min_mvt insert function
Cons:
- slightly slower selects (+3s / +25% for 1 year as per below)
- more tables/views to confuse people
Example implementation and testing:
CREATE TABLE gwolofs.unacceptable_gaps_indexed AS (
SELECT * FROM miovision_api.unacceptable_gaps
)
CREATE INDEX unacceptable_gaps_intersection_datetime_bin_idx
ON gwolofs.unacceptable_gaps_indexed
USING btree (intersection_uid, datetime_bin);
--DROP VIEW gwolofs.volumes_15min_mvt_filtered;
CREATE VIEW gwolofs.volumes_15min_mvt_filtered AS (
SELECT
v1m.volume_15min_mvt_uid,
v1m.intersection_uid,
v1m.datetime_bin,
v1m.classification_uid,
v1m.leg,
v1m.movement_uid,
CASE
WHEN un.datetime_bin IS NULL THEN v1m.volume
ELSE NULL
END AS volume
FROM miovision_api.volumes_15min_mvt AS v1m --this would change to _unfiltered and have zeros instead of nulls
LEFT JOIN gwolofs.unacceptable_gaps_indexed AS un USING (intersection_uid, datetime_bin)
)
--test view performance:
--15.345 + 15.324s + 15.317s : 15.32 avg
SELECT intersection_uid, classification_uid, AVG(volume)
FROM gwolofs.volumes_15min_mvt_filtered
WHERE datetime_bin >= '2023-01-01' AND datetime_bin < '2024-01-01'
--new index useful on smaller slices
--WHERE datetime_bin >= '2023-01-01' AND datetime_bin < '2024-01-01' AND intersection_uid = 65
GROUP BY intersection_uid, classification_uid
--baseline:
--12.2s / 11.724s / 12.119s: 12.01s avg
SELECT intersection_uid, classification_uid, AVG(volume)
FROM miovision_api.volumes_15min_mvt
WHERE datetime_bin >= '2023-01-01' AND datetime_bin < '2024-01-01'
GROUP BY intersection_uid, classification_uid
Is volumes_15min_mvt
the only table affected by this issue? Would we have to do this on volumes_15 table as well ?
You're right, we'd need to add the same view/table treatment to volumes_15min
which increases the complexity a bit. Also we'd need to edit many views which refer to these:
volumes_15min:
_RETURN ON aduyves.aadt_miovision_avg_daily _RETURN ON data_requests.i0533_intersec_uoft_atr _RETURN ON covid.miovision_hourly _RETURN ON covid.miovision_hourly_new _RETURN ON covid.miovision_hourly_temp _RETURN ON rapidto.miovision_segments_comparison_daily _RETURN ON rapidto.miovision_segments_comparison_hourly _RETURN ON activeto.miovision_volumes_15min_adj
volumes_15min_mvt:
_RETURN ON data_requests.i0533_intersec_uoft_tmc _RETURN ON gwolofs.open_leg_issues _RETURN ON miovision_api.volumes_15min_tmc
plus all the insert and clear functions
@chmnata one more option I thought of is to change the aggregation order. Instead of find_gaps
-> volumes_15min*
we could do volumes_15min*
->find_gaps
and add an update volumes_15min*
clause to find_gaps
using the new inserts.
This would require much fewer changes to database than above.