cockroach icon indicating copy to clipboard operation
cockroach copied to clipboard

ccl/backupccl: add log-based telemetry to backup and restore

Open rhu713 opened this issue 3 years ago • 1 comments
trafficstars

Backport 1/1 commits from #82463.

/cc @cockroachdb/release


Previously we didn't have logging for backup, backup schedule, and restore events in the telemetry structured logs. These logs are needed as they will be exported to Snowflake as part the new telemetry system. This change adds the logging of these events for every invoked backup and restore, and whenever a backup schedule is created. These events have the following format:

Field name Field type Example Value Description Field is valid for
recovery_type predefined list {backup, scheduled backup, restore} Did the user use backup, restore or scheduled backup? all events
target_scope predefined list {cluster, database, table, schema} What is the scope of target object is the user backing up / restoring? all events
is_multiregion_target bool true, false Does the target contain objects with multi-region primitives? all events
target_count int 3 How many targets (databases, clusters, etc) is the user backing up/restoring? all events
destination_subdir_type predefined list {custom, standard, latest} custom = custom name for their sub directory, standard = date-based sub-dir, latest = latest subdir all events
destination_storage_types predefined list {aws, gs, azure, http, nodelocal, userfile, other} What is the cloud storage that the user is writing this backup to / restoring from? all events
destination_auth_types predefined list {implicit, specified, other} What authentication is used to access the cloud storage that the user wants to write the backup to / restore from? all events
is_locality_aware bool true, false Is this backup / restore locality aware? all events
as_of_interval int relative time passed in AOST flag (e.g. -10s) What system time does the use want to run this backup / restore as of? all events
with_revision_history bool true, false Does the backup include revision history? all events
has_encryption_passphrase bool true, false Did the user provide an encryption passphrase to encrypt / decrypt their backup? all events
is_detached bool true, false Did the user take a backup / restore with detached flag? all events
kms_type predefined list {aws, gcp, other, none} Did the user provide a KMS to encrypt/decrypt backup? Which KMS? all events
kms_count int 2 Did the user provide multiple KMSs to encrypt / decrypt the backup? How many? all events
result_status predefined list succeeded, failed, canceled What was the result code of the backup - did it succeed, fail? all events
error_text string custom What was the reason for failure? all events
recurring_cron string default, custom crontab string (e.g. 1d) How often does the user want to take a backup? (full or inc) scheduled backups
full_backup_cron string default, always, custom crontab string (e.g. 1w) How often does the user want to take a full backup? scheduled backups
custom_first_run_time int timestamp Did the user configure a custom first run time? scheduled backups
on_execution_failure predefined list {retry, reschedule, pause, other} What does the user want to do if the schedule fails to execute? scheduled backups
on_previous_running predefined list {start, skip, wait} What does the user want to do if the previous scheduld backup is still running? scheduled backups
ignore_existing_backup bool {true, false} If backups were already created in the destination that the new schedule references, is the new schedule backing up different objects? scheduled backups
restore_options list of predefined strings ["into_db", "skip_missing_fk"] Which restore options did the user use? restore
into_db entry in restore_options restore_options : ["into_db"] Did the user provide a new DB to restore the table(s) to? restore (table-level)
rename_db entry in restore_options restore_options : ["rename_db"] Did the user provide a new name for the restored DB? restore (database-level)
skip_missing_fk entry in restore_options restore_options : ["skip_missing_fk"] Does the user want to skip missing foreign keys on restore? restore
skip_missing_sequences entry in restore_options restore_options : ["skip_missing_sequences"] Doe the user want to skip missing sequences on restore? restore
skip_missing_views entry in restore_options restore_options : ["skip_missing_views"] Does the user want to skip missing views on restore? restore
skip_localities_check entry in restore_options restore_options : ["skip_localities_check"] Does the user want to skip check for mis-matching localities on restore? restore
debug_pause_on predefined list {error} Does the user want to pause the restore if an error occurs? restore

Release note (enterprise change): Backup, restore, and backup schedule creation now have corresponding events that are emitted to the telemetry channel.

rhu713 avatar Sep 13 '22 22:09 rhu713

This change is Reviewable

cockroach-teamcity avatar Sep 13 '22 22:09 cockroach-teamcity

@livlobo has confirmed this is a high priority addition to 22.1? That's correct!

rhu713 avatar Sep 23 '22 13:09 rhu713