milvus
milvus copied to clipboard
[Bug]: querynode restarts due to `SIGSEGV: segmentation violation` after etcd follower pod failure chaos test
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version:master-20240814-c42976ee-amd64
- Deployment mode(standalone or cluster):
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
I20240814 09:38:05.100278 6108 SegmentSealedImpl.cpp:108] [SERVER][LoadVecIndex][milvus] Before setting field_bit for field index, fieldID:111. segmentID:451838354631885067,
I20240814 09:38:05.100486 6108 SegmentSealedImpl.cpp:125] [SERVER][LoadVecIndex][milvus] Has load vec index done, fieldID:111. segmentID:451838354631885067,
[2024/08/14 09:38:05.100 +00:00] [INFO] [segments/segment.go:1207] ["updateSegmentIndex done"] [traceID=d3b3e901a43f7bf13fa720efa7d76e14] [collectionID=451838354629667593] [partitionID=451838354629667594] [segmentID=451838354631885067] [fieldID=111]
I20240814 09:38:05.100801 6111 load_index_c.cpp:236] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=100][enable_mmap=false] load index 451838354629667625
[2024-08-14T09:38:05Z INFO tantivy::indexer::segment_updater] save metas
add<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<>&&, folly::futures::detail::State)::<lambda(folly::Executor::KeepAlive<>&&)> >
/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/Executor.h:186 pc=0x7f6b52b2334c
operator()<folly::futures::detail::CoreBase::doCallback(folly::Executor::KeepAlive<>&&, folly::futures::detail::State)::<lambda(folly::Executor::KeepAlive<>&&)> >
/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:583 pc=0x7f6b52b2334c
_ZN5folly7futures6detail8CoreBase10doCallbackEONS_8Executor9KeepAliveIS3_EENS1_5StateE
/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:608 pc=0x7f6b52b2334c
_ZN5folly7futures6detail8CoreBase12setCallback_EONS_8FunctionIFvRS2_ONS_8Executor9KeepAliveIS5_EEPNS_17exception_wrapperEEEEOSt10shared_ptrINS_14RequestContextEENS1_18InlineContinuationE
/root/.conan/data/folly/2023.10.30.08/milvus/dev/build/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/folly/futures/detail/Core.cpp:468 pc=0x7f6b52b24053
I20240814 09:38:05.205202 6111 load_index_c.cpp:300] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=100][enable_mmap=false] load index 451838354629667625 done
[2024/08/14 09:38:05.205 +00:00] [INFO] [segments/segment.go:1207] ["updateSegmentIndex done"] [traceID=d3b3e901a43f7bf13fa720efa7d76e14] [collectionID=451838354629667593] [partitionID=451838354629667594] [segmentID=451838354631885067] [fieldID=100]
I20240814 09:38:05.205718 6111 load_index_c.cpp:236] [SERVER][AppendIndexV2][milvus] [collection=451838354629667593][segment=451838354631885067][field=101][enable_mmap=false] load index 451838354629667646
[2024-08-14T09:38:05Z INFO tantivy::indexer::segment_updater] save metas
setCallback<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/detail/Core.h:632 pc=0x7f6b59d86277
setCallback_<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:310 pc=0x7f6b59d86277
setCallback_<folly::futures::detail::FutureBase<folly::Unit>::thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >(folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>&&, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void>, folly::futures::detail::InlineContinuation)::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)> >
/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:318 pc=0x7f6b59d86277
thenImplementation<folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, folly::futures::detail::tryExecutorCallableResult<folly::Unit, folly::Future<folly::Unit>::thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>(milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&) &&::<lambda(folly::Executor::KeepAlive<>&&, folly::Try<folly::Unit>&&)>, void> >
/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:379 pc=0x7f6b59d86277
thenTry<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>
/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future-inl.h:945 pc=0x7f6b59d86277
then<milvus::futures::Future<milvus::SearchResult>::asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >(folly::Executor::KeepAlive<>, int, AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)>&&)::<lambda(auto:85&&)>&>
/root/.conan/data/folly/2023.10.30.08/milvus/dev/package/71e52ec7e6bdcb39e8f12e598f0e25527e54965c/include/folly/futures/Future.h:1240 pc=0x7f6b59d86277
asyncProduce<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >
/workspace/source/internal/core/src/futures/Future.h:188 pc=0x7f6b59d86277
async<AsyncSearch(CTraceContext, CSegmentInterface, CSearchPlan, CPlaceholderGroup, uint64_t)::<lambda(milvus::futures::CancellationToken)> >
/workspace/source/internal/core/src/futures/Future.h:98 pc=0x7f6b59d86277
AsyncSearch
/workspace/source/internal/core/src/segcore/segment_c.cpp:121 pc=0x7f6b59d86277
_cgo_548efe5569b7_Cfunc_AsyncSearch
/tmp/go-build/cgo-gcc-prolog:121 pc=0x501a1ec
runtime.asmcgocall
/usr/local/go/src/runtime/asm_amd64.s:872 pc=0x1ef4087
SIGSEGV: segmentation violation
PC=0x7f6b52987c89 m=3092 sigcode=1
signal arrived during cgo execution
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
failed job: https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17071/pipeline
log: artifacts-etcd-followers-pod-failure-17071-server-logs.tar.gz
Anything else?
No response
/assign @weiliu1031 could you please have a look? Thanks
@zhuwenxing is this only a issue on master? Is this on ARM or X86?
@zhuwenxing
please make sure you are using the version with no clusterIP to do etcd kills test. I some some error comes etcd is not connected. Check with @LoveEachDay and make sure you use the correct setup.
ideally we shouldn't see panic on this etcd connect failed
[2024/08/14 09:08:21.394 +00:00] [DEBUG] [querynode/service.go:118] ["QueryNode connect to etcd failed"] [error="context deadline exceeded"] [2024/08/14 09:08:21.394 +00:00] [ERROR] [components/query_node.go:56] ["QueryNode starts error"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/workspace/source/cmd/components/query_node.go:56\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"] panic: context deadline exceeded
Still checking the SigSeg issue
@zhuwenxing Is these issue reproduced?
/assign chyezh
@xiaofan-luan only reproduced in master. It's AMD because the testing cluster consists of AMD machines.
@LoveEachDay instance is created by helm version milvus-4.2.5.tgz, can you help to check the setup.
@chyezh It is not a stable reproduced issue. for now, it only happened once.
Using three headless-service address for three etcd members with etcd 3.5.14.
crash in a async search in segment 451838354632090846
[2024/08/14 09:37:56.144 +00:00] [DEBUG] [segments/segment.go:499] ["search segment..."] [traceID=0afd39eaef3452ce9e8c8832ac9a6c58] [collectionID=451838354629467151] [segmentID=451838354632090846] [segmentType=Sealed] [withIndex=false]
SIGNAL CATCH BY NON-GO SIGNAL HANDLER
SIGNO: 11; SIGNAME: Segmentation fault; SI_CODE: 1; SI_ADDR: 0x7f6864980050
but this segment still in loading:
[2024/08/14 09:37:58.006 +00:00] [INFO] [segments/segment_loader.go:541] ["start loading remote..."] [traceID=0b7133f7d6374b23acbde92672342745] [collectionID=451838354629467151] [segmentIDs="[451838354632090846]"] [segmentNum=1]
[2024/08/14 09:37:58.006 +00:00] [INFO] [segments/segment_loader.go:551] ["loading bloom filter for remote..."] [traceID=0b7133f7d6374b23acbde92672342745] [collectionID=451838354629467151] [segmentIDs="[451838354632090846]"]
[2024/08/14 09:37:58.015 +00:00] [INFO] [segments/segment_loader.go:945] ["Successfully load pk stats"] [traceID=0b7133f7d6374b23acbde92672342745] [segmentID=451838354632090846] [time=9.151753ms] [size=34304]
load segment has been done.
[2024/08/14 09:37:54.495 +00:00] [INFO] [querynodev2/services.go:492] ["load segments done..."] [traceID=5ed136892591447ab531c9fa37abd7d9] [collectionID=451838354629467151] [partitionID=451838354629467152] [shard=by-dev-rootcoord-dml_1_451838354629467151v0] [segmentID=451838354632090846] [level=L1] [currentNodeID=3] [segments="[451838354632090846]"]
load delete data at 09:37:58
any progress?
any progress?
Make asan available for milvus binary and image #35627, and trying to reproduce it.
and some odr violation #35549,#35633 is found and fixed #35610, but not make sure whether it's related to this issue.
Find an assertion failure when reproducing.
milvus: /go/src/github.com/milvus-io/milvus/internal/core/src/exec/expression/EvalCtx.h:36: milvus::exec::EvalCtx::EvalCtx(milvus::exec::ExecContext*, milvus::exec::ExprSet*, milvus::RowVector*): Assertion `expr_set_ != nullptr' failed.
Find an assertion failure when reproducing.
milvus: /go/src/github.com/milvus-io/milvus/internal/core/src/exec/expression/EvalCtx.h:36: milvus::exec::EvalCtx::EvalCtx(milvus::exec::ExecContext*, milvus::exec::ExprSet*, milvus::RowVector*): Assertion `expr_set_ != nullptr' failed.
It's another unrelated issue, see #35771. doing reproduce again after the fix.
@zhuwenxing
please make sure you are using the version with no clusterIP to do etcd kills test. I some some error comes etcd is not connected. Check with @LoveEachDay and make sure you use the correct setup.
ideally we shouldn't see panic on this etcd connect failed
[2024/08/14 09:08:21.394 +00:00] [DEBUG] [querynode/service.go:118] ["QueryNode connect to etcd failed"] [error="context deadline exceeded"] [2024/08/14 09:08:21.394 +00:00] [ERROR] [components/query_node.go:56] ["QueryNode starts error"] [error="context deadline exceeded"] [stack="github.com/milvus-io/milvus/cmd/components.(*QueryNode).Run\n\t/workspace/source/cmd/components/query_node.go:56\ngithub.com/milvus-io/milvus/cmd/roles.runComponent[...].func1\n\t/workspace/source/cmd/roles/roles.go:121"] panic: context deadline exceeded
Still checking the SigSeg issue
It happens when testing initialization, etcd is not ready yet, and no etcd chaos have been injected. Therefore, it meets expectations.
[2024-08-14T09:07:23.151Z] + helm install --wait --debug --timeout 600s etcd-followers-pod-failure-17071 milvus/milvus --set image.all.repository=harbor.milvus.io/milvus/milvus --set image.all.tag=master-20240814-c42976ee-amd64 --set metrics.serviceMonitor.enabled=true --set etcd.metrics.enabled=true --set etcd.metrics.podMonitor.enabled=true --set etcd.metrics.podMonitor.namespace=chaos-testing --set quotaAndLimits.enabled=false -f ../cluster-values.yaml -n=chaos-testing
[2024-08-14T09:07:23.154Z] WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
[2024-08-14T09:07:23.154Z] WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
[2024-08-14T09:07:23.154Z] install.go:178: [debug] Original chart version: ""
[2024-08-14T09:07:24.083Z] install.go:195: [debug] CHART PATH: /root/.cache/helm/repository/milvus-4.2.4.tgz
[2024-08-14T09:07:24.083Z]
[2024-08-14T09:07:25.011Z] client.go:128: [debug] creating 42 resource(s)
[2024-08-14T09:07:25.267Z] wait.go:48: [debug] beginning wait for 42 resources with timeout of 10m0s
[2024-08-14T09:07:26.191Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:29.453Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:31.970Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:34.491Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:37.757Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:40.271Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:43.550Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:46.072Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:48.643Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:51.904Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:54.417Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:07:56.929Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:00.194Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:02.721Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:05.544Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:08.061Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:11.335Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:13.851Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:17.119Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:19.633Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:22.152Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:25.424Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:27.939Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:30.455Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:33.720Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:36.233Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:39.498Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:42.013Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:44.525Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
[2024-08-14T09:08:47.794Z] ready.go:277: [debug] Deployment is not ready: chaos-testing/etcd-followers-pod-failure-17071-milvus-datanode. 0 out of 2 expected pods are ready
Can't reproduce after odr fixed and enable the asan.
https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17504/pipeline/ https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17505/pipeline/ https://qa-jenkins.milvus.io/blue/organizations/jenkins/chaos-test-cron/detail/chaos-test-cron/17506/pipeline/
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.
Not reproduced