milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: indexnode always crash

Open lzhin opened this issue 8 months ago • 8 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version:2.3.0
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    pulsar 
- SDK version(e.g. pymilvus v2.0.0rc2):v2.3.0
- OS(Ubuntu or CentOS): Ubuntu 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

indexnode always crash. the error is: [2024/05/28 15:33:51.615 +08:00] [INFO] [tracer/tracer.go:71] ["Init tracer finished"] [Exporter=stdout] [2024/05/28 15:33:51.686 +08:00] [INFO] [sessionutil/session_util.go:841] ["register session success"] [role=indexnode] [key=by-dev/meta/session/indexnode-5906] [2024/05/28 15:33:52.561 +08:00] [INFO] [indexnode/indexnode_service.go:228] ["Get Index Job Stats"] [traceID=d77a164256a226145aa7b58c1c456f2e] [Unissued=0] [Active=0] [Slot=1] [2024/05/28 15:33:52.564 +08:00] [INFO] [indexnode/indexnode_service.go:53] ["IndexNode building index ..."] [traceID=5fbdc07ecc696210c452e72f215f5b9c] [ClusterID=by-dev] [IndexBuildID=448526227129723636] [IndexID=0] [IndexName=] [IndexFilePrefix=files/index_files] [IndexVersion=2594] [DataPaths="[files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570147,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570154,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570161,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570168,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570175,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570182,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570189,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570196,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570203,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570210,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570217,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570224,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570231,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570238,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570245,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570252,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570259,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570266,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570273,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570280,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570287,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570294,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570301,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570308,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570315,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570322,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570329,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570336,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570343,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570350,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570357,files/insert_log/444047871088825752/444047871088825753/448526227129320035/102/448526226933570364]"] [TypeParams="[{"key":"dim","value":"128"}]"] [IndexParams="[{"key":"index_type","value":"IVF_SQ8"},{"key":"metric_type","value":"IP"},{"key":"nlist","value":"8192"}]"] [num_rows=63249] [2024/05/28 15:33:52.567 +08:00] [INFO] [storage/minio_chunk_manager.go:154] ["minio chunk manager init success."] [bucketname=a-bucket] [root=files] [2024/05/28 15:33:52.567 +08:00] [INFO] [indexnode/chunk_mgr_factory.go:39] ["index node successfully init chunk manager"] [2024/05/28 15:33:52.567 +08:00] [INFO] [indexnode/task.go:147] ["IndexNode IndexBuilderTask Enqueue"] [buildID=448526227129723636] [segmentID=0] [2024/05/28 15:33:52.567 +08:00] [INFO] [indexnode/indexnode_service.go:118] ["IndexNode successfully scheduled"] [traceID=5fbdc07ecc696210c452e72f215f5b9c] [IndexBuildID=448526227129723636] [ClusterID=by-dev] [indexName=] [2024/05/28 15:33:52.567 +08:00] [DEBUG] [indexnode/task_scheduler.go:219] ["process task"] [task=by-dev/448526227129723636] [2024/05/28 15:33:52.567 +08:00] [INFO] [indexnode/task.go:153] ["Begin to prepare indexBuildTask"] [buildID=448526227129723636] [Collection=0] [SegmentID=0] [2024/05/28 15:33:52.567 +08:00] [INFO] [indexnode/task.go:181] ["Successfully prepare indexBuildTask"] [buildID=448526227129723636] [Collection=0] [SegmentID=0] [2024/05/28 15:33:52.571 +08:00] [INFO] [indexnode/indexnode_service.go:228] ["Get Index Job Stats"] [traceID=7f188ea90e32257214f608d2edc25c1e] [Unissued=0] [Active=1] [Slot=0] [2024/05/28 15:33:52.572 +08:00] [INFO] [indexnode/task.go:322] ["index params are ready"] [buildID=448526227129723636] ["index params"="{"dim":"128","index_type":"IVF_SQ8","metric_type":"IP","nlist":"8192"}"] I20240528 15:33:52.572669 39 MinioChunkManager.cpp:125] [SEGCORE][InitSDKAPI][milvus] init aws with log level:error [2024/05/28 15:33:52.715 +08:00] [DEBUG] [indexnode/indexnode_service.go:281] [IndexNode.GetMetrics] [traceID=ab93bc79e625c69a99dad54011cf8e17] [nodeID=5906] [req="{"metric_type":"system_info"}"] [metric_type=system_info] [] [2024/05/28 15:33:53.559 +08:00] [DEBUG] [indexnode/indexnode_service.go:168] ["querying index build task"] [traceID=7c67fcee2bce29de231cbf72a3a66a83] [ClusterID=by-dev] [IndexBuildID=448526227129723636] [state=InProgress] ["fail reason"=] [2024/05/28 15:33:53.560 +08:00] [INFO] [indexnode/indexnode_service.go:228] ["Get Index Job Stats"] [traceID=1f77cc462c9a8d9f87b44e55e7103508] [Unissued=0] [Active=1] [Slot=0] I20240528 15:33:53.622552 39 MinioChunkManager.cpp:96] [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-05-28 07:33:53.622 CurlHttpClient [140032102819584] Curl returned error code 28 - Timeout was reached I20240528 15:33:53.622675 39 MinioChunkManager.cpp:96] [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-05-28 07:33:53.622 EC2MetadataClient [140032102819584] Http request to retrieve credentials failed [2024/05/28 15:33:54.559 +08:00] [DEBUG] [indexnode/indexnode_service.go:168] ["querying index build task"] [traceID=46219a076ffb709b948aafca5f9295a4] [ClusterID=by-dev] [IndexBuildID=448526227129723636] [state=InProgress] ["fail reason"=] [2024/05/28 15:33:54.560 +08:00] [INFO] [indexnode/indexnode_service.go:228] ["Get Index Job Stats"] [traceID=5568cb92d661dba8c068ec2663e3ab2d] [Unissued=0] [Active=1] [Slot=0] I20240528 15:33:54.622944 39 MinioChunkManager.cpp:96] [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-05-28 07:33:54.622 CurlHttpClient [140032102819584] Curl returned error code 28 - Timeout was reached I20240528 15:33:54.623031 39 MinioChunkManager.cpp:96] [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-05-28 07:33:54.623 EC2MetadataClient [140032102819584] Http request to retrieve credentials failed I20240528 15:33:54.623049 39 MinioChunkManager.cpp:96] [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-05-28 07:33:54.623 EC2MetadataClient [140032102819584] Can not retrieve resource from http://169.254.169.254/latest/meta-data/placement/availability-zone I20240528 15:33:54.623261 39 MinioChunkManager.cpp:285] [SEGCORE][MinioChunkManager][milvus] init MinioChunkManager with parameter[endpoint: , default_bucket_name:'a-bucket', use_secure:'false'] I20240528 15:33:54.623351 39 factory.cc:20] [KNOWHERE][Create][milvus] create knowhere index IVF_SQ8 W20240528 15:33:54.623395 39 thread_pool.h:142] [KNOWHERE][GetGlobalSearchThreadPool][milvus] Global Search ThreadPool has not been initialized yet, init it with threads num: 48 I20240528 15:33:54.623541 39 ThreadPools.h:51] [SEGCORE][SetUpCoefficients][milvus] Init ThreadPools, high_priority_co:10, middle:5, low:1 I20240528 15:33:54.623559 39 ThreadPool.h:43] [SEGCORE][ThreadPool][milvus] Init thread pool:high_priority_thread_pool with min worker num:16 and max worker num:160 I20240528 15:33:54.683723 150 MinioChunkManager.cpp:96] [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-05-28 07:33:54.683 AWSClient [140031157065472] HTTP response code: 404 Resolved remote host IP address: 10.137.100.177 Request ID: Exception name: Error message: No response body. 10 response headers: accept-ranges : bytes content-length : 0 content-security-policy : block-all-mixed-content date : Tue, 28 May 2024 07:33:54 GMT server : MinIO strict-transport-security : max-age=31536000; includeSubDomains vary : Accept-Encoding x-amz-request-id : 17D396C0149A9640 x-content-type-options : nosniff x-xss-protection : 1; mode=block I20240528 15:33:54.683760 155 MinioChunkManager.cpp:96] [SEGCORE][ProcessFormattedStatement][milvus] [AWS LOG] [ERROR] 2024-05-28 07:33:54.683 AWSClient [140031115101952] HTTP response code: 404 Resolved remote host IP address: 10.137.100.177 Request ID: Exception name: Error message: No response body. 10 response headers: accept-ranges : bytes content-length : 0 content-security-policy : block-all-mixed-content date : Tue, 28 May 2024 07:33:54 GMT server : MinIO strict-transport-security : max-age=31536000; includeSubDomains vary : Accept-Encoding x-amz-request-id : 17D396C0149A2BF2 x-content-type-options : nosniff x-xss-protection : 1; mode=block

and the minio error is:

API: PutObjectPart(bucket=a-bucket, object=files/insert_log/444047871088825752/444047871088825753/448526226933464993/102/448526226933465224) Time: 00:05:22 UTC 04/01/2024 DeploymentID: 473e13f8-1cf4-423f-a4c5-8b64f93974fc RequestID: 17C1FF3194F59168 RemoteHost: 10.137.100.84

UserAgent: MinIO (linux; amd64) minio-go/v7.0.56 Error: [31m[1mfile not found (cmd.StorageErr)[0m[0m uploadID=2f230ea3-cc63-43b2-b5c7-745f2d25acd6, uploadIDDir=/minio_data/.minio.sys/multipart/605e819410f58ec1207e650b27086452f52dcf7650daebc83e8b4c4570a3381e/2f230ea3-cc63-43b2-b5c7-745f2d25acd6 1: cmd/fs-v1-multipart.go:99:cmd.(*FSObjects).backgroundAppend() [0m[m API: PutObjectPart(bucket=a-bucket, object=files/insert_log/444047871088825752/444047871088825753/448526226933468793/102/448526226933468808) Time: 00:04:37 UTC 04/02/2024 DeploymentID: 473e13f8-1cf4-423f-a4c5-8b64f93974fc RequestID: 17C24DBBE4EE0492 RemoteHost: 10.137.100.146

UserAgent: MinIO (linux; amd64) minio-go/v7.0.56 Error: [31m[1mfile not found (cmd.StorageErr)[0m[0m uploadID=51169295-f258-4843-8bcd-2a5e8fd34fcc, uploadIDDir=/minio_data/.minio.sys/multipart/cc9394acbfc5edd3fb29dab3a96995ac3feb8d4a82cb8fa0aa63d79c0998b9de/51169295-f258-4843-8bcd-2a5e8fd34fcc 1: cmd/fs-v1-multipart.go:99:cmd.(*FSObjects).backgroundAppend()

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

lzhin avatar May 28 '24 07:05 lzhin