clickhouse-backup icon indicating copy to clipboard operation
clickhouse-backup copied to clipboard

/backup/status should return latest command status if no one in progress commands executed

Open frankwg opened this issue 1 year ago • 9 comments

I used the locally built 2.5.0 for the testing and found the /backup/status endpoint returns empty after /backup/actions with {"command","create_remote <backup_name>"} or /backup/upload/<local_backup_name> was issued. But, it returns correctly while previous request is /backup/list or /backup/clean.

Note: the use_embedded_backup_restore: true was used. Also, the upload to the s3 was not successful.

frankwg avatar Apr 08 '24 05:04 frankwg

according to https://github.com/Altinity/clickhouse-backup/tree/master/ReadMe.md

GET /backup/status Display list of currently running asynchronous operations: curl -s localhost:7171/backup/status | jq .

When it return empty list, it means no one operation is currently running.

Check GET /backup/actions for status and list of all commands which run via API after clickhouse-backup server started

Slach avatar Apr 08 '24 07:04 Slach

Also, the upload to the s3 was not successful.

Do you have logs? how did you start clickhouse-backup server ? is this standalone or docker or kubernetes?

Slach avatar Apr 08 '24 08:04 Slach

I was using the clickhouse-operator and a minio deployment from the README.md

apiVersion: v1
kind: Secret
metadata:
  name: clickhouse-backup-config
stringData:
  config.yml: |
    general:
      remote_storage: s3
      log_level: debug
      restore_schema_on_cluster: "{cluster}"
      allow_empty_backups: true
      backups_to_keep_remote: 3
    clickhouse:
      use_embedded_backup_restore: true
      embedded_backup_disk: backups
      timeout: 4h
      skip_table_engines:
        - GenerateRandom
    api:
      listen: "0.0.0.0:7171"
      create_integration_tables: true
    s3:
      acl: private
      endpoint: http://s3-backup-minio:9000
      bucket: clickhouse
      path: backup/shard-{shard}
      access_key: backup-access-key
      secret_key: backup-secret-key
      force_path_style: true
      disable_ssl: true
      debug: true

---
apiVersion: "clickhouse.altinity.com/v1"
kind: "ClickHouseInstallation"
metadata:
  name: one-sidecar-embedded
spec:
  defaults:
    templates:
      podTemplate: clickhouse-backup
      dataVolumeClaimTemplate: data-volume
  configuration:
    profiles:
      default/distributed_ddl_task_timeout: 14400
    files:
      config.d/backup_disk.xml: |
        <clickhouse>
          <storage_configuration>
            <disks>
              <backups>
                <type>local</type>
                <path>/var/lib/clickhouse/backups/</path>
              </backups>
            </disks>
          </storage_configuration>
          <backups>
            <allowed_disk>backups</allowed_disk>
            <allowed_path>backups/</allowed_path>
          </backups>
        </clickhouse>     
    settings:
      # to allow scrape metrics via embedded prometheus protocol
      prometheus/endpoint: /metrics
      prometheus/port: 8888
      prometheus/metrics: true
      prometheus/events: true
      prometheus/asynchronous_metrics: true
      # need install zookeeper separately, look to https://github.com/Altinity/clickhouse-operator/tree/master/deploy/zookeeper/ for details
    zookeeper:
      nodes:
        - host: zookeeper
          port: 2181
      session_timeout_ms: 5000
      operation_timeout_ms: 5000
    clusters:
      - name: default
        layout:
          # 2 shards one replica in each
          shardsCount: 2
          replicas:
            - templates:
                podTemplate: pod-with-backup
            - templates:
                podTemplate: pod-clickhouse-only
  templates:
    volumeClaimTemplates:
      - name: data-volume
        spec:
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi
    podTemplates:
      - name: pod-with-backup
        metadata:
          annotations:
            prometheus.io/scrape: 'true'
            prometheus.io/port: '8888'
            prometheus.io/path: '/metrics'
            # need separate prometheus scrape config, look to https://github.com/prometheus/prometheus/issues/3756
            clickhouse.backup/scrape: 'true'
            clickhouse.backup/port: '7171'
            clickhouse.backup/path: '/metrics'
        spec:
          securityContext:
            runAsUser: 101
            runAsGroup: 101
            fsGroup: 101
          containers:
            - name: clickhouse-pod
              image: clickhouse/clickhouse-server
              command:
                - clickhouse-server
                - --config-file=/etc/clickhouse-server/config.xml
            - name: clickhouse-backup
              image: clickhouse-backup:build-docker
              # image: altinity/clickhouse-backup:master
              imagePullPolicy: IfNotPresent
              command:
                # - bash
                # - -xc
                # - "/bin/clickhouse-backup server"
                - "/src/build/linux/amd64/clickhouse-backup"
                - "server"
                # require to avoid double scraping clickhouse and clickhouse-backup containers
              ports:
                - name: backup-rest
                  containerPort: 7171
              volumeMounts:
                - name: config-volume
                  mountPath: /etc/clickhouse-backup/config.yml
                  subPath: config.yml
          volumes:
            - name: config-volume
              secret:
                secretName: clickhouse-backup-config
      - name: pod-clickhouse-only
        metadata:
          annotations:
            prometheus.io/scrape: 'true'
            prometheus.io/port: '8888'
            prometheus.io/path: '/metrics'
        spec:
          securityContext:
            runAsUser: 101
            runAsGroup: 101
            fsGroup: 101
          containers:
            - name: clickhouse-pod
              image: clickhouse/clickhouse-server
              command:
                - clickhouse-server
                - --config-file=/etc/clickhouse-server/config.xml

frankwg avatar Apr 08 '24 08:04 frankwg

logs from clickhouse-backup

2024/04/08 08:13:32.460629 debug calculate parts list `default`.`join` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024.04.08 08:13:32.456847 [ 10 ] {699cffe9-4cd2-4465-9483-8c090e67dd38} <Debug> executeQuery: (from 127.0.0.1:53702) SELECT sum(total_bytes) AS backup_data_size FROM system.tables WHERE concat(database,'.',name) IN ('default.join', 'default.table_for_dict', 'default.set', 'default.merge', 'default.memory', 'default.stripelog', 'default.log', 'default.tinylog', 'default.buffer', 'default.ndict', 'default.null', 'default.dict', 'default.distributed', 'default.generate_random') (stage: Complete)
2024/04/08 08:13:32.461204 debug /var/lib/clickhouse/backups/shard0/metadata/default/join.json created logger=backuper
2024/04/08 08:13:32.461226 debug calculate parts list `default`.`table_for_dict` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461462 debug /var/lib/clickhouse/backups/shard0/metadata/default/table_for_dict.json created logger=backuper
2024/04/08 08:13:32.461482 debug calculate parts list `default`.`set` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461629 debug /var/lib/clickhouse/backups/shard0/metadata/default/set.json created logger=backuper
2024/04/08 08:13:32.461639 debug calculate parts list `default`.`merge` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461781 debug /var/lib/clickhouse/backups/shard0/metadata/default/merge.json created logger=backuper
2024/04/08 08:13:32.461798 debug calculate parts list `default`.`memory` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.461953 debug /var/lib/clickhouse/backups/shard0/metadata/default/memory.json created logger=backuper
2024/04/08 08:13:32.461968 debug calculate parts list `default`.`stripelog` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462110 debug /var/lib/clickhouse/backups/shard0/metadata/default/stripelog.json created logger=backuper
2024/04/08 08:13:32.462132 debug calculate parts list `default`.`log` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462393 debug /var/lib/clickhouse/backups/shard0/metadata/default/log.json created logger=backuper
2024/04/08 08:13:32.462411 debug calculate parts list `default`.`tinylog` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462560 debug /var/lib/clickhouse/backups/shard0/metadata/default/tinylog.json created logger=backuper
2024/04/08 08:13:32.462577 debug calculate parts list `default`.`buffer` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462777 debug /var/lib/clickhouse/backups/shard0/metadata/default/buffer.json created logger=backuper
2024/04/08 08:13:32.462795 debug calculate parts list `default`.`ndict` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.462952 debug /var/lib/clickhouse/backups/shard0/metadata/default/ndict.json created logger=backuper
2024/04/08 08:13:32.462969 debug calculate parts list `default`.`null` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463084 debug /var/lib/clickhouse/backups/shard0/metadata/default/null.json created logger=backuper
2024/04/08 08:13:32.463103 debug calculate parts list `default`.`dict` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463209 debug /var/lib/clickhouse/backups/shard0/metadata/default/dict.json created logger=backuper
2024/04/08 08:13:32.463225 debug calculate parts list `default`.`distributed` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463332 debug /var/lib/clickhouse/backups/shard0/metadata/default/distributed.json created logger=backuper
2024/04/08 08:13:32.463347 debug calculate parts list `default`.`generate_random` from embedded backup disk `backups` backup=shard0 logger=backuper operation=create
2024/04/08 08:13:32.463453 debug /var/lib/clickhouse/backups/shard0/metadata/default/generate_random.json created logger=backuper
2024/04/08 08:13:32.463478  info SELECT value FROM `system`.`build_options` WHERE name='VERSION_DESCRIBE' logger=clickhouse
2024/04/08 08:13:32.465587 debug /var/lib/clickhouse/backups/shard0/metadata.json created logger=backuper
2024/04/08 08:13:32.465614  info done                      backup=shard0 duration=183ms logger=backuper operation=create_embedded
2024/04/08 08:13:32.465666  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:32.465752  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:32.466832  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:32.466868 error `general->remote_storage: s3` `clickhouse->use_embedded_backup_restore: true` require s3->compression_format: none, actual tar logger=validateUploadParams

frankwg avatar Apr 08 '24 08:04 frankwg

Ok. configuration looks correct, could you share logs with upload failuers?

kubectl logs -n <your-namespace> pod/chi-one-sidecar-embedded-default-0-0-0 -c clickhouse-backup --since=24h

one suggestion

        <disks>
          <backups>
            <type>local</type>
            <path>/var/lib/clickhouse/backups/</path>
          </backups>
        </disks>

have sense only for standalone hardware servers where /var/lib/clickhouse/backups/ mounted as separate HDD disk for example

in kubernetes better use s3 look examples in https://github.com/Altinity/clickhouse-backup/blob/master/test/integration/config-s3-embedded.yml#L23-L32 and

https://github.com/Altinity/clickhouse-backup/blob/master/test/integration/dynamic_settings.sh#L214-L238

Slach avatar Apr 08 '24 08:04 Slach

kubectl logs -n backup pod/chi-one-sidecar-embedded-default-0-0-0 -c clickhouse-backup --since=24h
2024/04/08 08:13:33.799590  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.803256  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.803397  info Create integration tables logger=server
2024/04/08 08:13:33.803439  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.804733  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.804776  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/04/08 08:13:33.810063  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/04/08 08:13:33.816762  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/04/08 08:13:33.823185  info SELECT engine FROM system.databases WHERE name = 'system' logger=clickhouse
2024/04/08 08:13:33.827989  info DROP TABLE IF EXISTS `system`.`backup_actions` NO DELAY logger=clickhouse
2024/04/08 08:13:33.829256  info CREATE TABLE system.backup_actions (command String, start DateTime, finish DateTime, status String, error String) ENGINE=URL('http://127.0.0.1:7171/backup/actions', JSONEachRow) SETTINGS input_format_skip_unknown_fields=1 logger=clickhouse
2024/04/08 08:13:33.836864  info SELECT engine FROM system.databases WHERE name = 'system' logger=clickhouse
2024/04/08 08:13:33.841018  info DROP TABLE IF EXISTS `system`.`backup_list` NO DELAY logger=clickhouse
2024/04/08 08:13:33.842460  info CREATE TABLE system.backup_list (name String, created DateTime, size Int64, location String, required String, desc String) ENGINE=URL('http://127.0.0.1:7171/backup/list', JSONEachRow) SETTINGS input_format_skip_unknown_fields=1 logger=clickhouse
2024/04/08 08:13:33.849179  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:33.849778  info Starting API server on 0.0.0.0:7171 logger=server.Run
2024/04/08 08:13:33.852641  info Update backup metrics start (onlyLocal=false) logger=server
2024/04/08 08:13:33.852713  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.852800  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.854251  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.854283  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/04/08 08:13:33.854455  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.854483  info SELECT value FROM `system`.`build_options` where name='VERSION_INTEGER' logger=clickhouse
2024/04/08 08:13:33.856983  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/04/08 08:13:33.857263  info SELECT countIf(name='type') AS is_disk_type_present, countIf(name='object_storage_type') AS is_object_storage_type_present, countIf(name='free_space') AS is_free_space_present, countIf(name='disks') AS is_storage_policy_present FROM system.columns WHERE database='system' AND table IN ('disks','storage_policies')  logger=clickhouse
2024/04/08 08:13:33.869224  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/04/08 08:13:33.878475 error ResumeOperationsAfterRestart return error: open /var/lib/clickhouse/backup: no such file or directory logger=server.Run
2024/04/08 08:13:33.880908  info SELECT d.path, any(d.name) AS name, any(lower(if(d.type='ObjectStorage',d.object_storage_type,d.type))) AS type, min(d.free_space) AS free_space, groupUniqArray(s.policy_name) AS storage_policies FROM system.disks AS d  LEFT JOIN (SELECT policy_name, arrayJoin(disks) AS disk FROM system.storage_policies) AS s ON s.disk = d.name GROUP BY d.path logger=clickhouse
2024/04/08 08:13:33.886506  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:33.886537  info clickhouse connection prepared: tcp://localhost:9000 run ping logger=clickhouse
2024/04/08 08:13:33.889603  info clickhouse connection success: tcp://localhost:9000 logger=clickhouse
2024/04/08 08:13:33.889652  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/04/08 08:13:33.893202  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/04/08 08:13:33.895191  info SELECT count() AS is_macros_exists FROM system.tables WHERE database='system' AND name='macros'  SETTINGS empty_result_for_aggregation_by_empty_set=0 logger=clickhouse
2024/04/08 08:13:33.898985  info SELECT macro, substitution FROM system.macros logger=clickhouse
2024/04/08 08:13:33.901912  info [s3:DEBUG] Request
GET /clickhouse?versioning= HTTP/1.1
Host: s3-backup-minio:9000
User-Agent: aws-sdk-go-v2/1.26.1 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#amd64 api/s3#1.53.1
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: f4e5a6fd-30db-45f1-ad62-5da57bdcf3ed
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=backup-access-key/20240408/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=ae5b666f1e1b560bc36e259126e21e8a28555fdd75d99dcd234b753eb902eff9
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240408T081333Z


2024/04/08 08:13:33.904818  info [s3:DEBUG] Response
HTTP/1.1 200 OK
Content-Length: 99
Accept-Ranges: bytes
Content-Type: application/xml
Date: Mon, 08 Apr 2024 08:13:33 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
X-Amz-Request-Id: 17C43FE5A7DC2AD8
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block


2024/04/08 08:13:33.905007 debug /tmp/.clickhouse-backup-metadata.cache.S3 not found, load 0 elements logger=s3
2024/04/08 08:13:33.905839  info [s3:DEBUG] Request
GET /clickhouse?delimiter=%2F&list-type=2&max-keys=1000&prefix=backup%2Fshard-0%2F HTTP/1.1
Host: s3-backup-minio:9000
User-Agent: aws-sdk-go-v2/1.26.1 os/linux lang/go#1.22.1 md/GOOS#linux md/GOARCH#amd64 api/s3#1.53.1
Accept-Encoding: identity
Amz-Sdk-Invocation-Id: 1225429f-7ccc-4a8c-aad4-32d681253c34
Amz-Sdk-Request: attempt=1; max=3
Authorization: AWS4-HMAC-SHA256 Credential=backup-access-key/20240408/us-east-1/s3/aws4_request, SignedHeaders=accept-encoding;amz-sdk-invocation-id;amz-sdk-request;host;x-amz-content-sha256;x-amz-date, Signature=418ee0c64337f86817a6c3eb53ca41834c2e47f36bd8823fa4a598fdbd242913
X-Amz-Content-Sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
X-Amz-Date: 20240408T081333Z


2024/04/08 08:13:33.907610  info [s3:DEBUG] Response
HTTP/1.1 200 OK
Content-Length: 280
Accept-Ranges: bytes
Content-Type: application/xml
Date: Mon, 08 Apr 2024 08:13:33 GMT
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Id-2: dd9025bab4ad464b049177c95eb6ebf374d3b3fd1af9251148b658df7ac2e3e8
X-Amz-Request-Id: 17C43FE5A800167C
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block


2024/04/08 08:13:33.908370 debug /tmp/.clickhouse-backup-metadata.cache.S3 save 0 elements logger=s3
2024/04/08 08:13:33.908473  info clickhouse connection closed logger=clickhouse
2024/04/08 08:13:33.908509  info Update backup metrics finish LastBackupCreateLocal=2024-04-08 08:13:32.463475027 +0000 UTC LastBackupCreateRemote=<nil> LastBackupSizeLocal=13636 LastBackupSizeRemote=0 LastBackupUpload=<nil> NumberBackupsLocal=1 NumberBackupsRemote=0 duration=56ms logger=server

frankwg avatar Apr 08 '24 08:04 frankwg

Root reason in the logs

error general->remote_storage: s3 clickhouse->use_embedded_backup_restore: true require s3->compression_format: none, actual tar logger=validateUploadParams

just add

s3:
  compression_format: none

into your secret

Slach avatar Apr 08 '24 08:04 Slach

Would you mind adding the following instead of an empty string as the response?

{
  "command": "create_remote <backup_name>",
  "status": "error",
  "start": "2024-03-26 08:15:42",
  "finish": "2024-03-26 08:17:12",
  "error": "`general->remote_storage: s3` `clickhouse->use_embedded_backup_restore: true` require s3->compression_format: none"
}

frankwg avatar Apr 09 '24 01:04 frankwg

@frankwg good suggestion, thanks

Slach avatar Apr 09 '24 18:04 Slach