milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: querynode restarts due to `SIGSEGV: segmentation violation` after etcd follower pod failure chaos test

Open zhuwenxing opened this issue 1 year ago • 16 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:master-20240814-c42976ee-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

I20240814 09:38:05.100278  6108 SegmentSealedImpl.cpp:108] [SERVER][LoadVecIndex][milvus] Before setting field_bit for field index, fieldID:111. segmentID:451838354631885067, 
I20240814 09:38:05.100486  6108 SegmentSealedImpl.cpp:125] [SERVER][LoadVecIndex][milvus] Has load vec index done, fieldID:111. segmentID:451838354631885067, 
[2024/08/14 09:38:05.100 +00:00] [INFO] [segments/segment.go:1207] ["updateSegmentIndex done"] [traceID=d3b3e901a43f7bf13fa720efa7d76e14] [collectionID=451838354629667593] [partitionID=451838354629667594] [segmentID=451838354631885067] [fieldID=111]
I20240814 09:38:05.100801  6111 load_index_c.cpp:236] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=100][enable_mmap=false] load index 451838354629667625
[2024-08-14T09:38:05Z INFO  tantivy::indexer::segment_updater] save metas
add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<>&&, folly::futures::detail::State)::<lambda(folly::Executor::KeepAlive<>&&)> >
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/Executor.h:186 pc=0x7f6b52b2334c
operator()<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<>&&, folly::futures::detail::State)::<lambda(folly::Executor::KeepAlive<>&&)> >
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:583 pc=0x7f6b52b2334c
_ZN5folly7futures6detail8CoreBase10doCallbackEONS_8Executor9KeepAliveIS3_EENS1_5StateE
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:608 pc=0x7f6b52b2334c
_ZN5folly7futures6detail8CoreBase12setCallback_EONS_8FunctionIFvRS2_ONS_8Executor9KeepAliveIS5_EEPNS_17exception_wrapperEEEEOSt10shared_ptrINS_14RequestContextEENS1_18InlineContinuationE
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:468 pc=0x7f6b52b24053
I20240814 09:38:05.205202  6111 load_index_c.cpp:300] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=100][enable_mmap=false] load index 451838354629667625 done
[2024/08/14 09:38:05.205 +00:00] [INFO] [segments/segment.go:1207] ["updateSegmentIndex done"] [traceID=d3b3e901a43f7bf13fa720efa7d76e14] [collectionID=451838354629667593] [partitionID=451838354629667594] [segmentID=451838354631885067] [fieldID=100]
I20240814 09:38:05.205718  6111 load_index_c.cpp:236] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=101][enable_mmap=false] load index 451838354629667646
[2024-08-14T09:38:05Z INFO  tantivy::indexer::segment_updater] save metas
setCallback<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/detail/Core.h:632 pc=0x7f6b59d86277
setCallback_<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:310 pc=0x7f6b59d86277
setCallback_<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:318 pc=0x7f6b59d86277
thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:379 pc=0x7f6b59d86277
thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:945 pc=0x7f6b59d86277
then<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>
	/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future.h:1240 pc=0x7f6b59d86277
asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >
	/workspace/source/internal/core/src/futures/Future.h:188 pc=0x7f6b59d86277
async<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >
	/workspace/source/internal/core/src/futures/Future.h:98 pc=0x7f6b59d86277
AsyncSearch
	/workspace/source/internal/core/src/segcore/segment_c.cpp:121 pc=0x7f6b59d86277
_cgo_548efe5569b7_Cfunc_AsyncSearch
	/tmp/go-build/cgo-gcc-prolog:121 pc=0x501a1ec
runtime.asmcgocall
	/usr/local/go/src/runtime/asm_amd64.s:872 pc=0x1ef4087


SIGSEGV: segmentation violation
PC=0x7f6b52987c89 m=3092 sigcode=1
signal arrived during cgo execution

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17071/pipeline

log: artifacts-etcd-followers-pod-failure-17071-server-logs.tar.gz

Anything else?

No response

zhuwenxing avatar Aug 15 '24 02:08 zhuwenxing

/assign @weiliu1031 could you please have a look? Thanks

binbinlv avatar Aug 15 '24 03:08 binbinlv

@zhuwenxing is this only a issue on master? Is this on ARM or X86?

xiaofan-luan avatar Aug 15 '24 15:08 xiaofan-luan

@zhuwenxing

please make sure you are using the version with no clusterIP to do etcd kills test. I some some error comes etcd is not connected. Check with @LoveEachDay and make sure you use the correct setup.

ideally we shouldn't see panic on this etcd connect failed

[2024/08/14 09:08:21.394 +00:00] [DEBUG] [querynode/service.go:118] ["QueryNode connect to etcd failed"] [error="context deadline exceeded"] [2024/08/14 09:08:21.394 +00:00] [ERROR] [components/query_node.go:56] ["QueryNode starts error"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/workspace/source/cmd/components/query_node.go:56\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"] panic: context deadline exceeded

Still checking the SigSeg issue

xiaofan-luan avatar Aug 15 '24 21:08 xiaofan-luan

@zhuwenxing Is these issue reproduced?

chyezh avatar Aug 16 '24 02:08 chyezh

/assign chyezh

chyezh avatar Aug 16 '24 02:08 chyezh

@xiaofan-luan only reproduced in master. It's AMD because the testing cluster consists of AMD machines.

@LoveEachDay instance is created by helm version milvus-4.2.5.tgz, can you help to check the setup.

@chyezh It is not a stable reproduced issue. for now, it only happened once.

zhuwenxing avatar Aug 16 '24 02:08 zhuwenxing

image Using three headless-service address for three etcd members with etcd 3.5.14.

LoveEachDay avatar Aug 16 '24 06:08 LoveEachDay

crash in a async search in segment 451838354632090846

[2024/08/14 09:37:56.144 +00:00] [DEBUG] [segments/segment.go:499] ["search segment..."] [traceID=0afd39eaef3452ce9e8c8832ac9a6c58] [collectionID=451838354629467151] [segmentID=451838354632090846] [segmentType=Sealed] [withIndex=false]

SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: 0x7f6864980050

but this segment still in loading:

[2024/08/14 09:37:58.006 +00:00] [INFO] [segments/segment_loader.go:541] ["start loading remote..."] [traceID=0b7133f7d6374b23acbde92672342745] [collectionID=451838354629467151] [segmentIDs="[451838354632090846]"] [segmentNum=1]
[2024/08/14 09:37:58.006 +00:00] [INFO] [segments/segment_loader.go:551] ["loading bloom filter for remote..."] [traceID=0b7133f7d6374b23acbde92672342745] [collectionID=451838354629467151] [segmentIDs="[451838354632090846]"]
[2024/08/14 09:37:58.015 +00:00] [INFO] [segments/segment_loader.go:945] ["Successfully load pk stats"] [traceID=0b7133f7d6374b23acbde92672342745] [segmentID=451838354632090846] [time=9.151753ms] [size=34304]

cqy123456 avatar Aug 16 '24 07:08 cqy123456

load segment has been done.

[2024/08/14 09:37:54.495 +00:00] [INFO] [querynodev2/services.go:492] ["load segments done..."] [traceID=5ed136892591447ab531c9fa37abd7d9] [collectionID=451838354629467151] [partitionID=451838354629467152] [shard=by-dev-rootcoord-dml_1_451838354629467151v0] [segmentID=451838354632090846] [level=L1] [currentNodeID=3] [segments="[451838354632090846]"]

load delete data at 09:37:58

chyezh avatar Aug 16 '24 10:08 chyezh

any progress?

xiaofan-luan avatar Aug 22 '24 19:08 xiaofan-luan

any progress?

Make asan available for milvus binary and image #35627, and trying to reproduce it.

chyezh avatar Aug 22 '24 23:08 chyezh

and some odr violation #35549,#35633 is found and fixed #35610, but not make sure whether it's related to this issue.

chyezh avatar Aug 22 '24 23:08 chyezh

Find an assertion failure when reproducing.

milvus: /go/src/github.com/milvus-io/milvus/internal/core/src/exec/expression/EvalCtx.h:36: milvus::exec::EvalCtx::EvalCtx(milvus::exec::ExecContext*, milvus::exec::ExprSet*, milvus::RowVector*): Assertion `expr_set_ != nullptr' failed.

chyezh avatar Aug 23 '24 02:08 chyezh

Find an assertion failure when reproducing.

milvus: /go/src/github.com/milvus-io/milvus/internal/core/src/exec/expression/EvalCtx.h:36: milvus::exec::EvalCtx::EvalCtx(milvus::exec::ExecContext*, milvus::exec::ExprSet*, milvus::RowVector*): Assertion `expr_set_ != nullptr' failed.

It's another unrelated issue, see #35771. doing reproduce again after the fix.

chyezh avatar Aug 28 '24 08:08 chyezh

@zhuwenxing

please make sure you are using the version with no clusterIP to do etcd kills test. I some some error comes etcd is not connected. Check with @LoveEachDay and make sure you use the correct setup.

ideally we shouldn't see panic on this etcd connect failed

[2024/08/14 09:08:21.394 +00:00] [DEBUG] [querynode/service.go:118] ["QueryNode connect to etcd failed"] [error="context deadline exceeded"] [2024/08/14 09:08:21.394 +00:00] [ERROR] [components/query_node.go:56] ["QueryNode starts error"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/workspace/source/cmd/components/query_node.go:56\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"] panic: context deadline exceeded

Still checking the SigSeg issue

It happens when testing initialization, etcd is not ready yet, and no etcd chaos have been injected. Therefore, it meets expectations.

[2024-08-14T09:07:23.151Z] + helm install --wait --debug --timeout 600s etcd-followers-pod-failure-17071 milvus/milvus --set image.all.repository=harbor.milvus.io/milvus/milvus --set image.all.tag=master-20240814-c42976ee-amd64 --set metrics.serviceMonitor.enabled=true --set etcd.metrics.enabled=true --set etcd.metrics.podMonitor.enabled=true --set etcd.metrics.podMonitor.namespace=chaos-testing --set quotaAndLimits.enabled=false -f ../cluster-values.yaml -n=chaos-testing
[2024-08-14T09:07:23.154Z] WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
[2024-08-14T09:07:23.154Z] WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
[2024-08-14T09:07:23.154Z] install.go:178: [debug] Original chart version: ""
[2024-08-14T09:07:24.083Z] install.go:195: [debug] CHART PATH: /root/.cache/helm/repository/milvus-4.2.4.tgz
[2024-08-14T09:07:24.083Z] 
[2024-08-14T09:07:25.011Z] client.go:128: [debug] creating 42 resource(s)
[2024-08-14T09:07:25.267Z] wait.go:48: [debug] beginning wait for 42 resources with timeout of 10m0s
[2024-08-14T09:07:26.191Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:29.453Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:31.970Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:34.491Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:37.757Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:40.271Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:43.550Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:46.072Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:48.643Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:51.904Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:54.417Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:56.929Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:00.194Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:02.721Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:05.544Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:08.061Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:11.335Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:13.851Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:17.119Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:19.633Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:22.152Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:25.424Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:27.939Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:30.455Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:33.720Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:36.233Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:39.498Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:42.013Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:44.525Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:47.794Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready

chyezh avatar Aug 28 '24 09:08 chyezh

Can't reproduce after odr fixed and enable the asan.

https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17504/pipeline/ https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17505/pipeline/ https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17506/pipeline/

chyezh avatar Aug 28 '24 12:08 chyezh

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 29 '24 17:09 stale[bot]

Not reproduced

zhuwenxing avatar Sep 30 '24 03:09 zhuwenxing