snowplow-rdb-loader icon indicating copy to clipboard operation
snowplow-rdb-loader copied to clipboard

RDB Loader: add alerting for bad warehouse configuration

Open istreeter opened this issue 3 years ago • 3 comments

We often see that a warehouse can become mis-configured unexpectedly. For example, a warehouse admin might remove a permission from the loader role, which then prevents the loader from loading a batch. The proposal is for the loader to query the warehouse to discover if everything is configured as expected. For example, check that the table exists; that the loading stage exists; that the loader role is granted sufficient permissions.

The loader is a long-running process, whereas these types of mis-configuration can arise at any time, so it is not sufficient to just check at startup. Instead, I suggest the loader runs these checks immediately after any batch fails to load. If the loader detects a mis-configuration then it should send an alert message.

istreeter avatar Apr 01 '22 20:04 istreeter

Old Snowflake Loader also had some addition checks, e.g. we need to make sure that every SQS message coming in refers to a folder within configured stage. Otherwise it will be a silent no-op.

chuwy avatar Apr 03 '22 15:04 chuwy

Regarding the changed title of this issue: I was thinking we could implement this for all destinations, not just Snowflake. There must be similar examples where a Redshift load fails because the loader does not have the required permissions?

istreeter avatar Apr 03 '22 18:04 istreeter

Just talked to @stdfalse. He thinks there are two very rare misconfigurations he had noticed:

  1. S3 Bucket permissions (actually common with Snowflake)
  2. Redshift Load role
  3. ...and I also think we can check tables integrity - that they match their expected schema - although this check can be relatively expensive

chuwy avatar Apr 04 '22 09:04 chuwy