data-models
data-models copied to clipboard
⚠️ MAINTENANCE-ONLY MODE: Snowplow maintained SQL data models for working with Snowplow web and mobile behavioral data.
Currently, if we run the base module on a period of time (which starts with `start_date` or last processed event, and ends with that time plus `update_cadence_days`), then the model...
In case the [PII pseudonymization enrichment](https://docs.snowplowanalytics.com/docs/enriching-your-data/available-enrichments/pii-pseudonymization-enrichment/) is enabled and run, the length of the target fields may change depending on the hashing algorithm used. The complete list of fields that...
### Issue In `users_sessions_this_run` we filter `start_tstamp` from the `sessions` on `lower_limit` and `upper_limit` from `user_limits`. By joining on `start_tstamp` we attempt to pull info from the first session per...
Currently it is tricky to backfill new custom modules. The easiest path is to tear everything down and start again. This is inefficient, particularly when the custom module is completely...
Columns that are defined as VARCHAR in the main model are defined as CHAR in the custom table. https://github.com/snowplow/data-models/blob/master/web/v1/redshift/sql-runner/sql/custom/02-page-views-join/01-page-views-join-setup.sql#L4-L9 This led to the following failure on deployment of the table:...
An edge case has come up where two things conspired to produce duplicates in the model: 1. Some users' data have different session_ids at the same time (we think because...
There is no setting to setup role of user, that we be used to run all scripts. It leads to unexpected results (incorrect owner group in case there is more...
Currently sessions greater in length than `days_late_allowed` (default 3 days) will be included/excluded depending on the update cadence of the model. This is undesirable as the model should be deterministic....
Currently `domain_sessionid` is required for events to be processed by the base module. Create a generic base module that doesn't require and grouping identifiers like `domain_sessionid`. This will allow for...
We have had a report of a difference between the old web model vs new on BQ, where a lot more 'stray pings' are excluded. Needs investigation, but I suspect...