snowplow-rdb-loader
snowplow-rdb-loader copied to clipboard
RDB Loader: add alerting for bad warehouse configuration
We often see that a warehouse can become mis-configured unexpectedly. For example, a warehouse admin might remove a permission from the loader role, which then prevents the loader from loading a batch. The proposal is for the loader to query the warehouse to discover if everything is configured as expected. For example, check that the table exists; that the loading stage exists; that the loader role is granted sufficient permissions.
The loader is a long-running process, whereas these types of mis-configuration can arise at any time, so it is not sufficient to just check at startup. Instead, I suggest the loader runs these checks immediately after any batch fails to load. If the loader detects a mis-configuration then it should send an alert message.
Old Snowflake Loader also had some addition checks, e.g. we need to make sure that every SQS message coming in refers to a folder within configured stage. Otherwise it will be a silent no-op.
Regarding the changed title of this issue: I was thinking we could implement this for all destinations, not just Snowflake. There must be similar examples where a Redshift load fails because the loader does not have the required permissions?
Just talked to @stdfalse. He thinks there are two very rare misconfigurations he had noticed:
- S3 Bucket permissions (actually common with Snowflake)
- Redshift Load role
- ...and I also think we can check tables integrity - that they match their expected schema - although this check can be relatively expensive