gcp-ingestion icon indicating copy to clipboard operation
gcp-ingestion copied to clipboard

Union together different channels for Fenix views

Open jklukas opened this issue 5 years ago • 2 comments

The GUD datasets currently include only the release version of Fenix and they ignore any data in the org_mozilla_fenix_nightly_stable dataset. We should probably build in that support.

But it brings up a bigger question of how we want to present Fenix data to users. Should we have separate ETL pathways for the different source tables, unioning together the final results? Or should we union together these different channels as early as possible?

We could alter the org_mozilla_fenix.baseline view to be a union of the release and nightly tables, setting the normalized_channel field to "release" for rows coming from the one tables and "nightly" for rows coming from the other table. That approach would be vulnerable if there's schema drift between the two tables; it's not clear to me whether the probes are sourced independently for the different fenix channels or if we should always expect the schemas to match exactly. If the schemas ever didn't match, the view would return errors, which would be a bad user experience.

It would certainly be possible to union the two tables at the clients_daily level and let rows from nightly flow through that way.

Or we could duplicate all the queries from Fenix release to Fenix nightly. This is the purest solution, but leads to code duplication and proliferation of tasks in Airflow.

cc @fbertsch @relud

jklukas avatar Jan 21 '20 21:01 jklukas

It would certainly be possible to union the two tables at the clients_daily level and let rows from nightly flow through that way.

seems like a nice compromize

relud avatar Jan 21 '20 22:01 relud

union views for clients_daily-like tables have been deployed as part of https://bugzilla.mozilla.org/show_bug.cgi?id=1708166

jklukas avatar May 24 '21 17:05 jklukas