tiup icon indicating copy to clipboard operation
tiup copied to clipboard

Fail to scale-in a node with capacity = "1EiB"

Open JaySon-Huang opened this issue 1 year ago • 2 comments

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?

We try to setup a tiflash node, which store data to S3, reporting its capacity 1EiB (1024 PiB). But we find tiup failed to scale-in such tiflash node.

   {
      "store": {
        "id": 105,
        "address": "172.16.5.85:9520",
        "labels": [
          {
            "key": "engine_role",
            "value": "write"
          },
          {
            "key": "engine",
            "value": "tiflash"
          }
        ],
        "version": "v6.7.0-alpha-124-g57b7c4c5a1",
        "peer_address": "172.16.5.85:9020",
        "status_address": "172.16.5.85:20020",
        "git_hash": "57b7c4c5a116be40a25d1cc5f4b39b225f153015",
        "start_timestamp": 1678704391,
        "deploy_path": "/data1/jaysonhuang/tiupd/deploy/tiflash-5020/bin/tiflash",
        "last_heartbeat": 1678704625878475560,
        "node_state": 1,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1EiB",
        "available": "1024PiB",
        "used_size": "615B",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 2,
        "region_weight": 1,
        "region_score": 96,
        "region_size": 96,
        "witness_count": 0,
        "slow_score": 1,
        "slow_trend": {
          "cause_value": 250743.00833333333,
          "cause_rate": 0,
          "result_value": 0,
          "result_rate": 0
        },
        "receiving_snap_count": 1,
        "start_ts": "2023-03-13T18:46:31+08:00",
        "last_heartbeat_ts": "2023-03-13T18:50:25.87847556+08:00",
        "uptime": "3m54.87847556s"
      }
    }
>  tiup cluster scale-in j1 -N 172.16.5.85:5030 -y
tiup is checking updates for component cluster ...
Starting component `cluster`: /data2/rastorage_common/.tiup/components/cluster/v1.11.3/tiup-cluster scale-in j1 -N 172.16.5.85:5030 -y
...
+ [ Serial ] - ClusterOperate: operation=DestroyOperation, options={Roles:[] Nodes:[172.16.5.85:5030] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:600 IgnoreConfigCheck:false NativeSSH:false SSHType: Concurrency:5 SSHProxyHost: SSHProxyPort:22 SSHProxyUser:jaysonhuang SSHProxyIdentity:/data1/jaysonhuang/.ssh/id_rsa SSHProxyUsePassword:false SSHProxyTimeout:5 CleanupData:false CleanupLog:false CleanupAuditLog:false RetainDataRoles:[] RetainDataNodes:[] DisplayMode:default Operation:StartOperation}

Error: failed to destroy: invalid suffix: 'eib'
  1. What did you expect to see?

  2. What did you see instead?

2023-03-13T18:50:48.631+0800    INFO    + [ Serial ] - ClusterOperate: operation=DestroyOperation, options={Roles:[] Nodes:[172.16.5.85:5030] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:600 IgnoreConfigCheck:false NativeSSH:false SSHType: Concurrency:5 SSHProxyHost: SSHProxyPort:22 SSHProxyUser:jaysonhuang SSHProxyIdentity:/data1/jaysonhuang/.ssh/id_rsa SSHProxyUsePassword:false SSHProxyTimeout:5 CleanupData:false CleanupLog:false CleanupAuditLog:false RetainDataRoles:[] RetainDataNodes:[] DisplayMode:default Operation:StartOperation}
2023-03-13T18:50:48.631+0800    DEBUG   TaskBegin       {"task": "ClusterOperate: operation=DestroyOperation, options={Roles:[] Nodes:[172.16.5.85:5030] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:600 IgnoreConfigCheck:false NativeSSH:false SSHType: Concurrency:5 SSHProxyHost: SSHProxyPort:22 SSHProxyUser:jaysonhuang SSHProxyIdentity:/data1/jaysonhuang/.ssh/id_rsa SSHProxyUsePassword:false SSHProxyTimeout:5 CleanupData:false CleanupLog:false CleanupAuditLog:false RetainDataRoles:[] RetainDataNodes:[] DisplayMode:default Operation:StartOperation}"}
2023-03-13T18:50:48.641+0800    DEBUG   TaskFinish      {"task": "ClusterOperate: operation=DestroyOperation, options={Roles:[] Nodes:[172.16.5.85:5030] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:600 IgnoreConfigCheck:false NativeSSH:false SSHType: Concurrency:5 SSHProxyHost: SSHProxyPort:22 SSHProxyUser:jaysonhuang SSHProxyIdentity:/data1/jaysonhuang/.ssh/id_rsa SSHProxyUsePassword:false SSHProxyTimeout:5 CleanupData:false CleanupLog:false CleanupAuditLog:false RetainDataRoles:[] RetainDataNodes:[] DisplayMode:default Operation:StartOperation}", "error": "failed to destroy: invalid suffix: 'eib'", "errorVerbose": "invalid suffix: 'eib'\ngithub.com/pingcap/tiup/pkg/cluster/api/typeutil.(*ByteSize).UnmarshalJSON\n\tgithub.com/pingcap/tiup/pkg/cluster/api/typeutil/size.go:45\nencoding/json.(*decodeState).literalStore\n\tencoding/json/decode.go:872\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:388\nencoding/json.(*decodeState).object\n\tencoding/json/decode.go:775\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:374\nencoding/json.(*decodeState).object\n\tencoding/json/decode.go:775\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:374\nencoding/json.(*decodeState).array\n\tencoding/json/decode.go:562\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:364\nencoding/json.(*decodeState).object\n\tencoding/json/decode.go:775\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:374\nencoding/json.(*decodeState).unmarshal\n\tencoding/json/decode.go:181\nencoding/json.Unmarshal\n\tencoding/json/decode.go:108\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).GetStores.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:214\ngithub.com/pingcap/tiup/pkg/cluster/api.tryURLs\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:159\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).GetStores\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:208\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).GetCurrentStore\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:232\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).DelStore\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:665\ngithub.com/pingcap/tiup/pkg/cluster/operation.deleteMember\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/scale_in.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.ScaleInCluster\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/scale_in.go:292\ngithub.com/pingcap/tiup/pkg/cluster/operation.ScaleIn\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/scale_in.go:83\ngithub.com/pingcap/tiup/pkg/cluster/task.(*ClusterOperate).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/action.go:51\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/manager.(*Manager).ScaleIn\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/scale_in.go:135\ngithub.com/pingcap/tiup/components/cluster/command.newScaleInCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/scale_in.go:51\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:872\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:990\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:918\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:297\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:250\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to destroy"}
2023-03-13T18:50:48.641+0800    INFO    Execute command finished        {"code": 1, "error": "failed to destroy: invalid suffix: 'eib'", "errorVerbose": "invalid suffix: 'eib'\ngithub.com/pingcap/tiup/pkg/cluster/api/typeutil.(*ByteSize).UnmarshalJSON\n\tgithub.com/pingcap/tiup/pkg/cluster/api/typeutil/size.go:45\nencoding/json.(*decodeState).literalStore\n\tencoding/json/decode.go:872\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:388\nencoding/json.(*decodeState).object\n\tencoding/json/decode.go:775\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:374\nencoding/json.(*decodeState).object\n\tencoding/json/decode.go:775\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:374\nencoding/json.(*decodeState).array\n\tencoding/json/decode.go:562\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:364\nencoding/json.(*decodeState).object\n\tencoding/json/decode.go:775\nencoding/json.(*decodeState).value\n\tencoding/json/decode.go:374\nencoding/json.(*decodeState).unmarshal\n\tencoding/json/decode.go:181\nencoding/json.Unmarshal\n\tencoding/json/decode.go:108\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).GetStores.func1\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:214\ngithub.com/pingcap/tiup/pkg/cluster/api.tryURLs\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:159\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).GetStores\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:208\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).GetCurrentStore\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:232\ngithub.com/pingcap/tiup/pkg/cluster/api.(*PDClient).DelStore\n\tgithub.com/pingcap/tiup/pkg/cluster/api/pdapi.go:665\ngithub.com/pingcap/tiup/pkg/cluster/operation.deleteMember\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/scale_in.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.ScaleInCluster\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/scale_in.go:292\ngithub.com/pingcap/tiup/pkg/cluster/operation.ScaleIn\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/scale_in.go:83\ngithub.com/pingcap/tiup/pkg/cluster/task.(*ClusterOperate).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/action.go:51\ngithub.com/pingcap/tiup/pkg/cluster/task.(*Serial).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/task/task.go:86\ngithub.com/pingcap/tiup/pkg/cluster/manager.(*Manager).ScaleIn\n\tgithub.com/pingcap/tiup/pkg/cluster/manager/scale_in.go:135\ngithub.com/pingcap/tiup/components/cluster/command.newScaleInCmd.func1\n\tgithub.com/pingcap/tiup/components/cluster/command/scale_in.go:51\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:872\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:990\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:918\ngithub.com/pingcap/tiup/components/cluster/command.Execute\n\tgithub.com/pingcap/tiup/components/cluster/command/root.go:297\nmain.main\n\tgithub.com/pingcap/tiup/components/cluster/main.go:23\nruntime.main\n\truntime/proc.go:250\nruntime.goexit\n\truntime/asm_amd64.s:1594\nfailed to destroy"}
  1. What version of TiUP are you using (tiup --version)?
>  tiup --version
1.11.0 tiup
Go Version: go1.19.1
Git Ref: v1.11.0
GitHash: 1a4714d7146fa85240a1ff4ef7451df719e0b4f0

JaySon-Huang avatar Mar 13 '23 10:03 JaySon-Huang

Workaround by setting the capacity to 1 PiB

JaySon-Huang avatar Mar 13 '23 11:03 JaySon-Huang

https://github.com/docker/go-units/blob/master/size.go EiB is too big for gomod we use :(

nexustar avatar Mar 15 '23 08:03 nexustar