milvus [Bug]: Crash at startup

Is there an existing issue for this?

[X] I have searched the existing issues

Environment

- Milvus version: v2.1.0-hotfix-dcd6c9e
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): 2.0.2
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: qemu virtual cpu on Ryzen 3060x and 30gb of memory
- GPU: /
- Others: /

Current Behavior

When starting the milvus with docker-compose, milvusdb/milvus crashes. It worked before and only crashes after I inserted about 15 mio entities.

Expected Behavior

Successful startup.

Steps To Reproduce

Start the milvusdb on a collection with about 15 mio entities.

Milvus Log

...
[2022/08/11 13:55:00.459 +00:00] [WARN] [querynode/shard_node_detector.go:104] ["Node not found in session"] ["node id"=114]
[2022/08/11 13:55:00.459 +00:00] [WARN] [querynode/shard_node_detector.go:104] ["Node not found in session"] ["node id"=92]
[2022/08/11 13:55:00.459 +00:00] [INFO] [querynode/shard_cluster.go:203] ["ShardCluster add node"] [nodeID=176]
[2022/08/11 13:55:00.459 +00:00] [INFO] [querynode/shard_segment_detector.go:74] ["segmentDetector start watch"] [collectionID=435196158120820737] [replicaID=435166658906554448] [vchannelName=by-dev-rootcoord-dml_24_435196158120820737v0] [rootPath=by-dev/meta/queryCoord-segmentMeta/435196158120820737]
[2022/08/11 13:55:00.461 +00:00] [INFO] [querynode/shard_cluster.go:435] ["Shard Cluster update state"] [collectionID=435196158120820737] [replicaID=435166658906554448] [channel=by-dev-rootcoord-dml_24_435196158120820737v0] ["old state"=2] ["new state"=1] [caller=github.com/milvus-io/milvus/internal/querynode.(*ShardCluster).healthCheck]
[2022/08/11 13:55:00.461 +00:00] [INFO] [querynode/shard_cluster_service.go:80] ["successfully add shard cluster"] [collectionID=435196158120820737] [replica=435166658906554448] [vchan=by-dev-rootcoord-dml_24_435196158120820737v0]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x60 pc=0x2242630]

goroutine 263 [running]:
github.com/milvus-io/milvus/internal/querynode.(*watchDmChannelsTask).Execute(0xc093fee280, 0x43872d0, 0xc0007d6200, 0x0, 0x0)
	/go/src/github.com/milvus-io/milvus/internal/querynode/task.go:179 +0xcb0
github.com/milvus-io/milvus/internal/querynode.(*taskScheduler).processTask(0xc000e3a080, 0x439d730, 0xc093fee280, 0x439d5d0, 0xc00005a140)
	/go/src/github.com/milvus-io/milvus/internal/querynode/task_scheduler.go:110 +0x173
github.com/milvus-io/milvus/internal/querynode.(*taskScheduler).taskLoop(0xc000e3a080)
	/go/src/github.com/milvus-io/milvus/internal/querynode/task_scheduler.go:127 +0x15a
created by github.com/milvus-io/milvus/internal/querynode.(*taskScheduler).Start
	/go/src/github.com/milvus-io/milvus/internal/querynode/task_scheduler.go:135 +0x67

Anything else?

No response

Aug 11 '22 14:08 zigapk

@zigapk Could you please refer this script to export the whole Milvus logs for investigation?

/assign @zigapk /unassign

Aug 11 '22 14:08 yanliang567

Log file from docker-compose available here.

Aug 11 '22 15:08 zigapk

/assign @soothing-rain /unassign @zigapk

Aug 12 '22 06:08 yanliang567

/unassign /assign @wayblink

Can be quickly fixed with:

len(ufInfo.Binlogs) --> len(ufInfo.GetBinlogs())

However we need to figure out why this un-flushed segmentID was not found in req.SegmentInfo. And what to do if this happens.

Aug 12 '22 07:08 soothing-rain

I don't think this is the issue. I've configured all of parameters related to that and the behaviour stays the same. My milvus.yaml:

# Licensed to the LF AI & Data foundation under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Related configuration of etcd, used to store Milvus metadata.
etcd:
  endpoints:
    - localhost:2379
  rootPath: by-dev # The root path where data is stored in etcd
  metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
  kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
  log:
    # path is one of:
    #  - "default" as os.Stderr,
    #  - "stderr" as os.Stderr,
    #  - "stdout" as os.Stdout,
    #  - file path to append server logs to.
    # please adjust in embedded Milvus: /tmp/milvus/logs/etcd.log
    path: stdout
    level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  use:
    # please adjust in embedded Milvus: true
    embed: false # Whether to enable embedded Etcd (an in-process EtcdServer).
  data:
    # Embedded Etcd only.
    # please adjust in embedded Milvus: /tmp/milvus/etcdData/
    dir: default.etcd
  ssl:
    enabled: false # Whether to support ETCD secure connection mode
    tlsCert: /path/to/etcd-client.pem # path to your cert file
    tlsKey: /path/to/etcd-client-key.pem # path to your key file
    tlsCACert:  /path/to/ca.pem # path to your CACert file
    # TLS min version
    # Optional values: 1.0, 1.1, 1.2, 1.3。
    # We recommend using version 1.2 and above
    tlsMinVersion: 1.3

# please adjust in embedded Milvus: /tmp/milvus/data/
localStorage:
  path: /var/lib/milvus/data/

# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
  address: fra1.digitaloceanspaces.com # Address of MinIO/S3
  port: 443   # Port of MinIO/S3
  accessKeyID: asdf # accessKeyID of MinIO/S3
  secretAccessKey: asdf # MinIO/S3 encryption string
  useSSL: true # Access to MinIO/S3 with SSL
  bucketName: asdf # Bucket name in MinIO/S3
  rootPath: files # The root path where the message is stored in MinIO/S3
  # Whether to use AWS IAM role to access S3 instead of access/secret keys
  # For more infomation, refer to https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
  useIAM: false 
  # Custom endpoint for fetch IAM role credentials. 
  # Leave it empty if you want to use AWS default endpoint
  iamEndpoint: "" 

# Milvus supports three MQ: rocksmq(based on RockDB), Pulsar and Kafka, which should be reserved in config what you use.
# There is a note about enabling priority if we config multiple mq in this file
# 1. standalone(local) mode: rockskmq(default) > Pulsar > Kafka
# 2. cluster mode:  Pulsar(default) > Kafka (rocksmq is unsupported)

# Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar:
  address: localhost # Address of pulsar
  port: 6650 # Port of pulsar
  webport: 80 # Web port of pulsar, if you connect direcly without proxy, should use 8080
  maxMessageSize: 5242880 # 5 * 1024 * 1024 Bytes, Maximum size of each message in pulsar.

# If you want to enable kafka, needs to comment the pulsar configs
#kafka:
#  brokerList: localhost1:9092,localhost2:9092,localhost3:9092
#  saslUsername: username
#  saslPassword: password

rocksmq:
  # please adjust in embedded Milvus: /tmp/milvus/rdb_data
  path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq
  rocksmqPageSize: 2147483648 # 2 GB, 2 * 1024 * 1024 * 1024 bytes, The size of each page of messages in rocksmq
  retentionTimeInMinutes: 60 # 7 days, 7 * 24 * 60 minutes, The retention time of the message in rocksmq.
  retentionSizeInMB: 8192 # 8 GB, 8 * 1024 MB, The retention size of the message in rocksmq.
  lrucacheratio:  0.06 # rocksdb cache memory ratio

# Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord:
  address: localhost
  port: 53100

  dmlChannelNum: 256 # The number of dml channels created at system startup
  maxPartitionNum: 4096 # Maximum number of partitions in a collection
  minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed

  # (in seconds) Duration after which an import task will expire (be killed). Default 900 seconds (15 minutes).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importTaskExpiration: 900
  # (in seconds) Milvus will keep the record of import tasks for at least `importTaskRetention` seconds. Default 86400
  # seconds (24 hours).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importTaskRetention: 86400
  # (in seconds) Check an import task's segment loading state in queryNodes every `importSegmentStateCheckInterval`
  # seconds. Default 10 seconds.
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importSegmentStateCheckInterval: 10
  # (in seconds) Maximum time to wait for segments in a single import task to be loaded in queryNodes.
  # Default 60 seconds (1 minute).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importSegmentStateWaitLimit: 60
  # (in seconds) Check the building status of a task's segments' indices every `importIndexCheckInterval` seconds.
  # Default 10 seconds.
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importIndexCheckInterval: 10
  # (in seconds) Maximum time to wait for indices to be built on a single import task's segments.
  # Default 600 seconds (10 minutes).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importIndexWaitLimit: 600

# Related configuration of proxy, used to validate client requests and reduce the returned results.
proxy:
  port: 19530
  internalPort: 19529
  http:
    enabled: true # Whether to enable the http server
    debug_mode: false # Whether to enable http server debug mode

  timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick
  msgStream:
    timeTick:
      bufSize: 512
  maxNameLength: 255  # Maximum length of name for a collection or alias
  maxFieldNum: 256     # Maximum number of fields in a collection
  maxDimension: 32768 # Maximum dimension of a vector
  maxShardNum: 256 # Maximum number of shards in a collection
  maxTaskNum: 1024 # max task number of proxy task queue
  # please adjust in embedded Milvus: false
  ginLogging: true # Whether to produce gin logs.


# Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord:
  address: localhost
  port: 19531
  autoHandoff: true # Enable auto handoff
  autoBalance: true # Enable auto balance
  overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload
  balanceIntervalSeconds: 60
  memoryUsageMaxDifferencePercentage: 30

# Related configuration of queryNode, used to run hybrid search between vector and scalar data.
queryNode:
  cacheSize: 32 # GB, default 32 GB, `cacheSize` is the memory used for caching data for faster query. The `cacheSize` must be less than system memory size.
  port: 21123
  loadMemoryUsageFactor: 3 # The multiply factor of calculating the memory usage while loading segments

  stats:
    publishInterval: 1000 # Interval for querynode to report node information (milliseconds)
  dataSync:
    flowGraph:
      maxQueueLength: 1024 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
  # Segcore will divide a segment into multiple chunks to enbale small index
  segcore:
    chunkRows: 1024 # The number of vectors in a chunk.
    # Note: we have disabled segment small index since @2022.05.12. So below related configurations won't work.
    # We won't create small index for growing segments and search on these segments will directly use bruteforce scan.
    smallIndex:
      nlist: 128 # small index nlist, recommend to set sqrt(chunkRows), must smaller than chunkRows/8
      nprobe: 16 # nprobe to search small index, based on your accuracy requirement, must smaller than nlist
  cache:
    enabled: true
    memoryLimit: 2147483648 # 2 GB, 2 * 1024 *1024 *1024

  scheduler:
    receiveChanSize: 10240
    unsolvedQueueSize: 10240
    maxReadConcurrency: 0 # maximum concurrency of read task. if set to less or equal 0, it means no uppper limit.
    cpuRatio: 10.0 # ratio used to estimate read task cpu usage.

  grouping:
    enabled: true
    maxNQ: 1000
    topKMergeRatio: 10.0

indexCoord:
  address: localhost
  port: 31000

  gc:
    interval: 600 # gc interval in seconds

indexNode:
  port: 21121

  scheduler:
    buildParallel: 1

dataCoord:
  address: localhost
  port: 13333
  enableCompaction: true # Enable data segment compression
  enableGarbageCollection: true

  segment:
    maxSize: 512 # Maximum size of a segment in MB
    sealProportion: 0.25 # It's the minimum proportion for a segment which can be sealed
    assignmentExpiration: 2000 # The time of the assignment expiration in ms
    maxLife: 86400 # The max lifetime of segment in seconds, 24*60*60

  compaction:
    enableAutoCompaction: true

  gc:
    interval: 600 # gc interval in seconds
    missingTolerance: 86400 # file meta missing tolerance duration in seconds, 60*24
    dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 60*24


dataNode:
  port: 21124

  dataSync:
    flowGraph:
      maxQueueLength: 1024 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
  flush:
    # Max buffer size to flush for a single segment.
    insertBufSize: 16777216 # Bytes, 16 MB

# Configures the system log output.
log:
  level: warn # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  file:
    # please adjust in embedded Milvus: /tmp/milvus/logs
    rootPath: "" # default to stdout, stderr
    maxSize: 300 # MB
    maxAge: 1 # Maximum time for log retention in day.
    maxBackups: 20
  format: text # text/json

grpc:
  log:
    level: WARNING

  serverMaxRecvSize: 2147483647 # math.MaxInt32
  serverMaxSendSize: 2147483647 # math.MaxInt32
  clientMaxRecvSize: 104857600 # 100 MB, 100 * 1024 * 1024
  clientMaxSendSize: 104857600 # 100 MB, 100 * 1024 * 1024

  client:
    dialTimeout:      5000
    keepAliveTime:    10000
    keepAliveTimeout: 20000
    maxMaxAttempts: 5
    initialBackOff: 1.0
    maxBackoff: 60.0
    backoffMultiplier: 2.0

# Configure the proxy tls enable.
tls:
  serverPemPath: configs/cert/server.pem
  serverKeyPath: configs/cert/server.key
  caPemPath: configs/cert/ca.pem


common:
  # Channel name generation rule: ${namePrefix}-${ChannelIdx}
  chanNamePrefix:
    cluster: "by-dev"
    rootCoordTimeTick: "rootcoord-timetick"
    rootCoordStatistics: "rootcoord-statistics"
    rootCoordDml: "rootcoord-dml"
    rootCoordDelta: "rootcoord-delta"
    search: "search"
    searchResult: "searchResult"
    queryTimeTick: "queryTimeTick"
    queryNodeStats: "query-node-stats"
    # Cmd for loadIndex, flush, etc...
    cmd: "cmd"
    dataCoordStatistic: "datacoord-statistics-channel"
    dataCoordTimeTick: "datacoord-timetick-channel"
    dataCoordSegmentInfo: "segment-info-channel"

  # Sub name generation rule: ${subNamePrefix}-${NodeID}
  subNamePrefix:
    rootCoordSubNamePrefix: "rootCoord"
    proxySubNamePrefix: "proxy"
    queryNodeSubNamePrefix: "queryNode"
    dataNodeSubNamePrefix: "dataNode"
    dataCoordSubNamePrefix: "dataCoord"

  defaultPartitionName: "_default"  # default partition name for a collection
  defaultIndexName: "_default_idx"  # default index name
  retentionDuration: 60 # 5 days in seconds
  entityExpiration:  -1     # Entity expiration in seconds, CAUTION make sure entityExpiration >= retentionDuration and -1 means never expire

  gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency.

  # Default value: auto
  # Valid values: [auto, avx512, avx2, avx, sse4_2]
  # This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
  simdType: auto
  indexSliceSize: 16 # MB

  # please adjust in embedded Milvus: local
  storageType: minio

  security:
    authorizationEnabled: false
    # tls mode values [0, 1, 2]
    # 0 is close, 1 is one-way authentication, 2 is two-way authentication.
    tlsMode: 0

  mem_purge_ratio: 0.2 # in Linux os, if memory-fragmentation-size >= used-memory * ${mem_purge_ratio}, then do `malloc_trim`

Aug 22 '22 09:08 zigapk

I don't think this is the issue. I've configured all of parameters related to that and the behaviour stays the same. My milvus.yaml:

# Licensed to the LF AI & Data foundation under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Related configuration of etcd, used to store Milvus metadata.
etcd:
  endpoints:
    - localhost:2379
  rootPath: by-dev # The root path where data is stored in etcd
  metaSubPath: meta # metaRootPath = rootPath + '/' + metaSubPath
  kvSubPath: kv # kvRootPath = rootPath + '/' + kvSubPath
  log:
    # path is one of:
    #  - "default" as os.Stderr,
    #  - "stderr" as os.Stderr,
    #  - "stdout" as os.Stdout,
    #  - file path to append server logs to.
    # please adjust in embedded Milvus: /tmp/milvus/logs/etcd.log
    path: stdout
    level: info # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  use:
    # please adjust in embedded Milvus: true
    embed: false # Whether to enable embedded Etcd (an in-process EtcdServer).
  data:
    # Embedded Etcd only.
    # please adjust in embedded Milvus: /tmp/milvus/etcdData/
    dir: default.etcd
  ssl:
    enabled: false # Whether to support ETCD secure connection mode
    tlsCert: /path/to/etcd-client.pem # path to your cert file
    tlsKey: /path/to/etcd-client-key.pem # path to your key file
    tlsCACert:  /path/to/ca.pem # path to your CACert file
    # TLS min version
    # Optional values: 1.0, 1.1, 1.2, 1.3。
    # We recommend using version 1.2 and above
    tlsMinVersion: 1.3

# please adjust in embedded Milvus: /tmp/milvus/data/
localStorage:
  path: /var/lib/milvus/data/

# Related configuration of minio, which is responsible for data persistence for Milvus.
minio:
  address: fra1.digitaloceanspaces.com # Address of MinIO/S3
  port: 443   # Port of MinIO/S3
  accessKeyID: asdf # accessKeyID of MinIO/S3
  secretAccessKey: asdf # MinIO/S3 encryption string
  useSSL: true # Access to MinIO/S3 with SSL
  bucketName: asdf # Bucket name in MinIO/S3
  rootPath: files # The root path where the message is stored in MinIO/S3
  # Whether to use AWS IAM role to access S3 instead of access/secret keys
  # For more infomation, refer to https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use.html
  useIAM: false 
  # Custom endpoint for fetch IAM role credentials. 
  # Leave it empty if you want to use AWS default endpoint
  iamEndpoint: "" 

# Milvus supports three MQ: rocksmq(based on RockDB), Pulsar and Kafka, which should be reserved in config what you use.
# There is a note about enabling priority if we config multiple mq in this file
# 1. standalone(local) mode: rockskmq(default) > Pulsar > Kafka
# 2. cluster mode:  Pulsar(default) > Kafka (rocksmq is unsupported)

# Related configuration of pulsar, used to manage Milvus logs of recent mutation operations, output streaming log, and provide log publish-subscribe services.
pulsar:
  address: localhost # Address of pulsar
  port: 6650 # Port of pulsar
  webport: 80 # Web port of pulsar, if you connect direcly without proxy, should use 8080
  maxMessageSize: 5242880 # 5 * 1024 * 1024 Bytes, Maximum size of each message in pulsar.

# If you want to enable kafka, needs to comment the pulsar configs
#kafka:
#  brokerList: localhost1:9092,localhost2:9092,localhost3:9092
#  saslUsername: username
#  saslPassword: password

rocksmq:
  # please adjust in embedded Milvus: /tmp/milvus/rdb_data
  path: /var/lib/milvus/rdb_data # The path where the message is stored in rocksmq
  rocksmqPageSize: 2147483648 # 2 GB, 2 * 1024 * 1024 * 1024 bytes, The size of each page of messages in rocksmq
  retentionTimeInMinutes: 60 # 7 days, 7 * 24 * 60 minutes, The retention time of the message in rocksmq.
  retentionSizeInMB: 8192 # 8 GB, 8 * 1024 MB, The retention size of the message in rocksmq.
  lrucacheratio:  0.06 # rocksdb cache memory ratio

# Related configuration of rootCoord, used to handle data definition language (DDL) and data control language (DCL) requests
rootCoord:
  address: localhost
  port: 53100

  dmlChannelNum: 256 # The number of dml channels created at system startup
  maxPartitionNum: 4096 # Maximum number of partitions in a collection
  minSegmentSizeToEnableIndex: 1024 # It's a threshold. When the segment size is less than this value, the segment will not be indexed

  # (in seconds) Duration after which an import task will expire (be killed). Default 900 seconds (15 minutes).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importTaskExpiration: 900
  # (in seconds) Milvus will keep the record of import tasks for at least `importTaskRetention` seconds. Default 86400
  # seconds (24 hours).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importTaskRetention: 86400
  # (in seconds) Check an import task's segment loading state in queryNodes every `importSegmentStateCheckInterval`
  # seconds. Default 10 seconds.
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importSegmentStateCheckInterval: 10
  # (in seconds) Maximum time to wait for segments in a single import task to be loaded in queryNodes.
  # Default 60 seconds (1 minute).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importSegmentStateWaitLimit: 60
  # (in seconds) Check the building status of a task's segments' indices every `importIndexCheckInterval` seconds.
  # Default 10 seconds.
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importIndexCheckInterval: 10
  # (in seconds) Maximum time to wait for indices to be built on a single import task's segments.
  # Default 600 seconds (10 minutes).
  # Note: If default value is to be changed, change also the default in: internal/util/paramtable/component_param.go
  importIndexWaitLimit: 600

# Related configuration of proxy, used to validate client requests and reduce the returned results.
proxy:
  port: 19530
  internalPort: 19529
  http:
    enabled: true # Whether to enable the http server
    debug_mode: false # Whether to enable http server debug mode

  timeTickInterval: 200 # ms, the interval that proxy synchronize the time tick
  msgStream:
    timeTick:
      bufSize: 512
  maxNameLength: 255  # Maximum length of name for a collection or alias
  maxFieldNum: 256     # Maximum number of fields in a collection
  maxDimension: 32768 # Maximum dimension of a vector
  maxShardNum: 256 # Maximum number of shards in a collection
  maxTaskNum: 1024 # max task number of proxy task queue
  # please adjust in embedded Milvus: false
  ginLogging: true # Whether to produce gin logs.


# Related configuration of queryCoord, used to manage topology and load balancing for the query nodes, and handoff from growing segments to sealed segments.
queryCoord:
  address: localhost
  port: 19531
  autoHandoff: true # Enable auto handoff
  autoBalance: true # Enable auto balance
  overloadedMemoryThresholdPercentage: 90 # The threshold percentage that memory overload
  balanceIntervalSeconds: 60
  memoryUsageMaxDifferencePercentage: 30

# Related configuration of queryNode, used to run hybrid search between vector and scalar data.
queryNode:
  cacheSize: 32 # GB, default 32 GB, `cacheSize` is the memory used for caching data for faster query. The `cacheSize` must be less than system memory size.
  port: 21123
  loadMemoryUsageFactor: 3 # The multiply factor of calculating the memory usage while loading segments

  stats:
    publishInterval: 1000 # Interval for querynode to report node information (milliseconds)
  dataSync:
    flowGraph:
      maxQueueLength: 1024 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
  # Segcore will divide a segment into multiple chunks to enbale small index
  segcore:
    chunkRows: 1024 # The number of vectors in a chunk.
    # Note: we have disabled segment small index since @2022.05.12. So below related configurations won't work.
    # We won't create small index for growing segments and search on these segments will directly use bruteforce scan.
    smallIndex:
      nlist: 128 # small index nlist, recommend to set sqrt(chunkRows), must smaller than chunkRows/8
      nprobe: 16 # nprobe to search small index, based on your accuracy requirement, must smaller than nlist
  cache:
    enabled: true
    memoryLimit: 2147483648 # 2 GB, 2 * 1024 *1024 *1024

  scheduler:
    receiveChanSize: 10240
    unsolvedQueueSize: 10240
    maxReadConcurrency: 0 # maximum concurrency of read task. if set to less or equal 0, it means no uppper limit.
    cpuRatio: 10.0 # ratio used to estimate read task cpu usage.

  grouping:
    enabled: true
    maxNQ: 1000
    topKMergeRatio: 10.0

indexCoord:
  address: localhost
  port: 31000

  gc:
    interval: 600 # gc interval in seconds

indexNode:
  port: 21121

  scheduler:
    buildParallel: 1

dataCoord:
  address: localhost
  port: 13333
  enableCompaction: true # Enable data segment compression
  enableGarbageCollection: true

  segment:
    maxSize: 512 # Maximum size of a segment in MB
    sealProportion: 0.25 # It's the minimum proportion for a segment which can be sealed
    assignmentExpiration: 2000 # The time of the assignment expiration in ms
    maxLife: 86400 # The max lifetime of segment in seconds, 24*60*60

  compaction:
    enableAutoCompaction: true

  gc:
    interval: 600 # gc interval in seconds
    missingTolerance: 86400 # file meta missing tolerance duration in seconds, 60*24
    dropTolerance: 86400 # file belongs to dropped entity tolerance duration in seconds, 60*24


dataNode:
  port: 21124

  dataSync:
    flowGraph:
      maxQueueLength: 1024 # Maximum length of task queue in flowgraph
      maxParallelism: 1024 # Maximum number of tasks executed in parallel in the flowgraph
  flush:
    # Max buffer size to flush for a single segment.
    insertBufSize: 16777216 # Bytes, 16 MB

# Configures the system log output.
log:
  level: warn # Only supports debug, info, warn, error, panic, or fatal. Default 'info'.
  file:
    # please adjust in embedded Milvus: /tmp/milvus/logs
    rootPath: "" # default to stdout, stderr
    maxSize: 300 # MB
    maxAge: 1 # Maximum time for log retention in day.
    maxBackups: 20
  format: text # text/json

grpc:
  log:
    level: WARNING

  serverMaxRecvSize: 2147483647 # math.MaxInt32
  serverMaxSendSize: 2147483647 # math.MaxInt32
  clientMaxRecvSize: 104857600 # 100 MB, 100 * 1024 * 1024
  clientMaxSendSize: 104857600 # 100 MB, 100 * 1024 * 1024

  client:
    dialTimeout:      5000
    keepAliveTime:    10000
    keepAliveTimeout: 20000
    maxMaxAttempts: 5
    initialBackOff: 1.0
    maxBackoff: 60.0
    backoffMultiplier: 2.0

# Configure the proxy tls enable.
tls:
  serverPemPath: configs/cert/server.pem
  serverKeyPath: configs/cert/server.key
  caPemPath: configs/cert/ca.pem


common:
  # Channel name generation rule: ${namePrefix}-${ChannelIdx}
  chanNamePrefix:
    cluster: "by-dev"
    rootCoordTimeTick: "rootcoord-timetick"
    rootCoordStatistics: "rootcoord-statistics"
    rootCoordDml: "rootcoord-dml"
    rootCoordDelta: "rootcoord-delta"
    search: "search"
    searchResult: "searchResult"
    queryTimeTick: "queryTimeTick"
    queryNodeStats: "query-node-stats"
    # Cmd for loadIndex, flush, etc...
    cmd: "cmd"
    dataCoordStatistic: "datacoord-statistics-channel"
    dataCoordTimeTick: "datacoord-timetick-channel"
    dataCoordSegmentInfo: "segment-info-channel"

  # Sub name generation rule: ${subNamePrefix}-${NodeID}
  subNamePrefix:
    rootCoordSubNamePrefix: "rootCoord"
    proxySubNamePrefix: "proxy"
    queryNodeSubNamePrefix: "queryNode"
    dataNodeSubNamePrefix: "dataNode"
    dataCoordSubNamePrefix: "dataCoord"

  defaultPartitionName: "_default"  # default partition name for a collection
  defaultIndexName: "_default_idx"  # default index name
  retentionDuration: 60 # 5 days in seconds
  entityExpiration:  -1     # Entity expiration in seconds, CAUTION make sure entityExpiration >= retentionDuration and -1 means never expire

  gracefulTime: 5000 # milliseconds. it represents the interval (in ms) by which the request arrival time needs to be subtracted in the case of Bounded Consistency.

  # Default value: auto
  # Valid values: [auto, avx512, avx2, avx, sse4_2]
  # This configuration is only used by querynode and indexnode, it selects CPU instruction set for Searching and Index-building.
  simdType: auto
  indexSliceSize: 16 # MB

  # please adjust in embedded Milvus: local
  storageType: minio

  security:
    authorizationEnabled: false
    # tls mode values [0, 1, 2]
    # 0 is close, 1 is one-way authentication, 2 is two-way authentication.
    tlsMode: 0

  mem_purge_ratio: 0.2 # in Linux os, if memory-fragmentation-size >= used-memory * ${mem_purge_ratio}, then do `malloc_trim`

The log you uploaded shows an obvious null pointer bug. It has been fixed in the latest code. Could you upgrade your milvus and retry? Please let me know if you have any problem in retry.

Aug 24 '22 02:08 wayblink

/close

Oct 17 '22 08:10 wayblink

@wayblink: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Oct 17 '22 08:10 sre-ci-robot

milvus milvus copied to clipboard

[Bug]: Crash at startup

Is there an existing issue for this?

Environment

Current Behavior

Expected Behavior

Steps To Reproduce

Milvus Log

Anything else?

milvus
milvus copied to clipboard