persist resource group state failed
Bug Report
What did you do?
The PD leader logs occasionally report some errors:
What did you expect to see?
What did you see instead?
What version of PD are you using (pd-server -V)?
v8.5.1
Welcome @geeklc! It looks like this is your first issue to tikv/pd 🎉
Can you provide more logs from this time point, including TiDB's log? Also, what operations were performed at this time point?
do nothing in this time,The error log is continuously printed。
[2025/09/29 15:00:13.442 +08:00] [ERROR] [manager.go:352] ["persist resource group state failed"] [error="[PD:json:ErrJSONMarshal]failed to marshal json: json: unsupported value: NaN"]
Is this the first time the error occurred? I would appreciate the logs before and after the first error. @geeklc
the first time error logs : 2025-09.zip pd0919.log
I noticed PD leader resigned in
[2025/09/19 21:53:13.465 +08:00] [INFO] [server.go:1768] ["PD leader is ready to serve"] [leader-name=pd-2]
Before PD leader transfer, did this error occur in the previous PD leader? I need to check if this is the first time the error occurs.
this fist error in pd logs,and the tidb logs have been cleared pd0809_37.log
From the latest log you provided, I can see that the error suddenly appeared on August 9th. I suspect it was triggered by a create/alter resource group operation at that time. Could you provide the Resource Group settings you used? This would be more helpful for further investigation. @geeklc
[2025/08/09 17:04:49.004 +08:00] [INFO] [grpc_service.go:100] ["watch request"] [key=resource_group/settings]
I can’t quite remember the exact operations at that time; below is the current resource group:
我也遇到了这个问题,也是由于资源组的使用导致,进程已经运行了大概半个月 I also encountered this problem, which was also caused by the use of the resource group. The resource group has been running for about half a month.
version: v7.1.1
panic: json: unsupported value: NaN
goroutine 226125595 [running]:
github.com/tikv/pd/pkg/mcs/resource_manager/server.(*ResourceGroup).Copy(0x2a43780?)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/pkg/mcs/resource_manager/server/resource_group.go:68 +0x12c
github.com/tikv/pd/pkg/mcs/resource_manager/server.(*Manager).GetResourceGroup(0xc019ecf710?, {0xc160135158?, 0x2b237e0?})
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/pkg/mcs/resource_manager/server/manager.go:225 +0xc5
github.com/tikv/pd/pkg/mcs/resource_manager/server.(*Service).GetResourceGroup(0xc0015b9cc8?, {0xc0002177d0?, 0xc08912b6e0?}, 0xc08912b5c0?)
/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/pkg/mcs/resource_manager/server/grpc_service.go:100 +0x8a
github.com/pingcap/kvproto/pkg/resource_manager._ResourceManager_GetResourceGroup_Handler.func1({0x3ac01d8, 0xc08912b590}, {0x2c701e0?, 0xc08912b5c0})
/go/pkg/mod/github.com/pingcap/[email protected]/pkg/resource_manager/resource_manager.pb.go:1886 +0x78
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1({0x3ac01d8?, 0xc08912b590?}, {0x2c701e0?, 0xc08912b5c0?})
/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:31 +0x89
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1({0x3ac01d8, 0xc08912b590}, {0x2c701e0, 0xc08912b5c0}, 0xc04d12a540?, 0xc160178050)
/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/server_metrics.go:107 +0x87
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1({0x3ac01d8?, 0xc08912b590?}, {0x2c701e0?, 0xc08912b5c0?})
/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0x6f
go.etcd.io/etcd/etcdserver/api/v3rpc.newUnaryInterceptor.func1({0x3ac01d8, 0xc08912b590}, {0x2c701e0?, 0xc08912b5c0}, 0x0?, 0xc160178050)
/go/pkg/mod/go.etcd.io/[email protected]/etcdserver/api/v3rpc/interceptor.go:70 +0x2a2
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1({0x3ac01d8?, 0xc08912b590?}, {0x2c701e0?, 0xc08912b5c0?})
/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:34 +0x6f
go.etcd.io/etcd/etcdserver/api/v3rpc.newLogUnaryInterceptor.func1({0x3ac01d8, 0xc08912b590}, {0x2c701e0, 0xc08912b5c0}, 0xc130bd2060, 0xc160178050)
/go/pkg/mod/go.etcd.io/[email protected]/etcdserver/api/v3rpc/interceptor.go:77 +0xc3
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1({0x3ac01d8, 0xc08912b590}, {0x2c701e0, 0xc08912b5c0}, 0xc130bd2060, 0xc04b879938)
/go/pkg/mod/github.com/grpc-ecosystem/[email protected]/chain.go:39 +0x1a3
github.com/pingcap/kvproto/pkg/resource_manager._ResourceManager_GetResourceGroup_Handler({0x2c193a0?, 0xc0015b9cc8}, {0x3ac01d8, 0xc08912b590}, 0xc16014ec00, 0xc0038d0060)
/go/pkg/mod/github.com/pingcap/[email protected]/pkg/resource_manager/resource_manager.pb.go:1888 +0x138
google.golang.org/grpc.(*Server).processUnaryRPC(0xc0038bcc00, {0x3acf7e0, 0xc02531e480}, 0xc1190d8b00, 0xc0038dc8d0, 0x4d27598, 0x0)
/go/pkg/mod/google.golang.org/[email protected]/server.go:1024 +0xd5e
google.golang.org/grpc.(*Server).handleStream(0xc0038bcc00, {0x3acf7e0, 0xc02531e480}, 0xc1190d8b00, 0x0)
/go/pkg/mod/google.golang.org/[email protected]/server.go:1313 +0xa25
google.golang.org/grpc.(*Server).serveStreams.func1.1()
/go/pkg/mod/google.golang.org/[email protected]/server.go:722 +0x98
created by google.golang.org/grpc.(*Server).serveStreams.func1
/go/pkg/mod/google.golang.org/[email protected]/server.go:720 +0xea
[2025/10/10 10:44:08.907 +08:00] [WARN] [retry_interceptor.go:62] ["retrying of unary invoker failed"] [target=endpoint://client-97396e1f-0c9c-4847-8610-9066d0a607f3/172.31.102.49:2379] [attempt=0] [error="rpc error: code = DeadlineExceeded desc = context deadline exceeded"]
Please help to take a look @JmPotato @nolouch @glorv
similar to #7206, there must be a race condition that read&update the resource group at the same time.
similar to #7206, there must be a race condition that read&update the resource group at the same time.
Wasn’t #7206 already fixed in version v7? Are there still related issues in v8.5.1? Should we confirm the scope of impact?