milvus
milvus copied to clipboard
[Bug]: [laion1b-test] mixcoord panic: runtime error: invalid memory address or nil pointer dereference
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: cardinal-milvus-io-2.3-b2d3278-20240206
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): pulsar
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.3.6rc3
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
- create collection
laion_stable_3
with 64 num_partitions (partition-key field), it's schema is:
{'auto_id': False, 'description': '', 'fields': [{'name': 'id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'float_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 768}}, {'name': 'int64_pk_5b', 'description': '', 'type': <DataType.INT64: 5>, 'is_partition_key': True}, {'name': 'varchar_caption', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 8192}}, {'name': 'varchar_NSFW', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 8192}}, {'name': 'float64_similarity', 'description': '', 'type': <DataType.FLOAT: 10>}, {'name': 'int64_width', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_height', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_original_width', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'int64_original_height', 'description': '', 'type': <DataType.INT64: 5>}, {'name': 'varchar_md5', 'description': '', 'type': <DataType.VARCHAR: 21>, 'params': {'max_length': 8192}}], 'enable_dynamic_field': True}
- build hnsw index and load collection
- insert 50m-768d data and flush -> index again -> load again
- concurrent: insert + delete + flush + search + query
'concurrent_params': {'concurrent_number': 100,
'during_time': '10h',
'interval': 120,
'spawn_rate': None},
'concurrent_tasks': [{'type': 'query',
'weight': 4,
'params': {'expr': '50000000 '
'< '
'id '
'< '
'5010000',
'timeout': 1200}},
{'type': 'search',
'weight': 25,
'params': {'nq': 10,
'top_k': 100,
'random_data': True,
'search_param': {'ef': 100},
'timeout': 600}},
{'type': 'insert',
'weight': 10,
'params': {'nb': 200,
'start_id': 50000000,
'random_id': True,
'random_vector': True,
'timeout': 600}},
{'type': 'delete',
'weight': 10,
'params': {'delete_length': 100,
'timeout': 600}},
{'type': 'flush',
'weight': 1,
'params': {'timeout': 600}}]},
- problems: a. mixcoord panic: mc_qqg4d_pre_panic.log
\":\"files/insert_log/447619508238296599/447619508238296630/447619508291373964/102/447619508290724826\",\"log_size\":1584}]}]"]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x3caf1ee]
goroutine 26463 [running]:
panic({0x4b333a0, 0x74ac860})
/usr/local/go/src/runtime/panic.go:987 +0x3bb fp=0xc021134090 sp=0xc021133fd0 pc=0x1a99cdb
runtime.panicmem(...)
/usr/local/go/src/runtime/panic.go:260
runtime.sigpanic()
/usr/local/go/src/runtime/signal_unix.go:841 +0x37d fp=0xc0211340f0 sp=0xc021134090 pc=0x1ab1fdd
github.com/milvus-io/milvus/internal/distributed/datanode/client.wrapGrpcCall[...]({0x56e7a78, 0xc033927f50?}, 0x0, 0xc036ab76c0)
/go/src/github.com/milvus-io/milvus/internal/distributed/datanode/client/client.go:90 +0xae fp=0xc021134128 sp=0xc0211340f0 pc=0x3caf1ee
github.com/milvus-io/milvus/internal/distributed/datanode/client.(*Client).GetMetrics(0xc010e1c120?, {0x56e7a78?, 0xc033927f50}, 0x40b0000000000000?, {0x652?, 0x27?, 0x26?})
/go/src/github.com/milvus-io/milvus/internal/distributed/datanode/client/client.go:168 +0x107 fp=0xc021134188 sp=0xc021134128 pc=0x3ca6b27
github.com/milvus-io/milvus/internal/datacoord.(*Server).getDataNodeMetrics(_, {_, _}, _, _)
/go/src/github.com/milvus-io/milvus/internal/datacoord/metrics_info.go:154 +0x134 fp=0xc021134418 sp=0xc021134188 pc=0x3deadf4
github.com/milvus-io/milvus/internal/datacoord.(*Server).getSystemInfoMetrics(0xc00138f8c0, {0x56e7a78, 0xc033927f50}, 0x0?)
/go/src/github.com/milvus-io/milvus/internal/datacoord/metrics_info.go:63 +0x1d8 fp=0xc021134ed8 sp=0xc021134418 pc=0x3dea1d8
github.com/milvus-io/milvus/internal/datacoord.(*Server).GetMetrics(0xc00138f8c0, {0x56e7a78, 0xc033927f50}, 0xc033994f80)
/go/src/github.com/milvus-io/milvus/internal/datacoord/services.go:987 +0x1b8 fp=0xc0211353f8 sp=0xc021134ed8 pc=0x3e10278
github.com/milvus-io/milvus/internal/distributed/datacoord.(*Server).GetMetrics(0xc0327af000?, {0x56e7a78?, 0xc033927f50?}, 0xc03399a338?)
/go/src/github.com/milvus-io/milvus/internal/distributed/datacoord/service.go:352 +0x2f fp=0xc021135428 sp=0xc0211353f8 pc=0x3e2cb2f
github.com/milvus-io/milvus/internal/proto/datapb._DataCoord_GetMetrics_Handler.func1({0x56e7a78, 0xc033927f50}, {0x4e7eb40?, 0xc033994f80})
/go/src/github.com/milvus-io/milvus/internal/proto/datapb/data_coord.pb.go:6806 +0x7b fp=0xc021135468 sp=0xc021135428 pc=0x2947afb
b. many flush 120s timeout
c. datanode
laion1b-test-2-milvus-datanode-5989b844f5-pw5sg
oomkilled and laion1b-test-2-milvus-datanode-5989b844f5-zm6lp
ERROR 1 ExitCode terminated
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
- argo: laion1b-test-cron-3
- grafana: metrics of laion1b-test-2
- pods:
laion1b-test-2-etcd-0 1/1 Running 1 (40d ago) 67d
laion1b-test-2-etcd-1 1/1 Running 0 67d
laion1b-test-2-etcd-2 1/1 Running 0 67d
laion1b-test-2-milvus-datanode-5989b844f5-pw5sg 1/1 Running 121 (10h ago) 14d
laion1b-test-2-milvus-datanode-5989b844f5-zm6lp 1/1 Running 116 (10h ago) 14d
laion1b-test-2-milvus-indexnode-7bb59785b5-46clt 1/1 Running 0 14d
laion1b-test-2-milvus-indexnode-7bb59785b5-7hfxj 1/1 Running 0 14d
laion1b-test-2-milvus-indexnode-7bb59785b5-gfp8l 1/1 Running 0 14d
laion1b-test-2-milvus-indexnode-7bb59785b5-jlrnw 1/1 Running 0 14d
laion1b-test-2-milvus-indexnode-7bb59785b5-knxwn 1/1 Running 0 14d
laion1b-test-2-milvus-mixcoord-868c566c7c-qqg4d 1/1 Running 22 (10h ago) 14d
laion1b-test-2-milvus-proxy-64b6d7787-zck8d 1/1 Running 1 (14d ago) 14d
laion1b-test-2-milvus-querynode-1-6f8889c79b-2j46t 1/1 Running 0 14d
laion1b-test-2-milvus-querynode-1-6f8889c79b-jfbtf 1/1 Running 0 14d
laion1b-test-2-milvus-querynode-1-6f8889c79b-nv75z 1/1 Running 0 14d
laion1b-test-2-milvus-querynode-1-6f8889c79b-w5dg2 1/1 Running 0 14d
laion1b-test-2-pulsar-bookie-0 1/1 Running 0 67d
laion1b-test-2-pulsar-bookie-1 1/1 Running 0 44d
laion1b-test-2-pulsar-bookie-2 1/1 Running 0 67d
laion1b-test-2-pulsar-broker-0 1/1 Running 0 61d
laion1b-test-2-pulsar-proxy-0 1/1 Running 0 67d
laion1b-test-2-pulsar-recovery-0 1/1 Running 0 67d
laion1b-test-2-pulsar-zookeeper-0 1/1 Running 0 67d
laion1b-test-2-pulsar-zookeeper-1 1/1 Running 0 67d
laion1b-test-2-pulsar-zookeeper-2 1/1 Running 0 67d
Anything else?
No response
/assign @xiaocai2333
/unassign
Bug: set nil struct pointer to describe nil interface.
func defaultSessionCreator() dataNodeCreatorFunc {
return func(ctx context.Context, addr string, nodeID int64) (types.DataNodeClient, error) {
return grpcdatanodeclient.NewClient(ctx, addr, nodeID) // default
}
}
func NewClient(ctx context.Context, addr string, nodeID int64) (*Client, error) {
...
}
did not appear again