milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s)

Open weiZhenkun opened this issue 1 year ago • 5 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: 2.3.1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): extenal kafka   
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.1
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

query error.

Expected Behavior

query error: failed to search/query delegator 10 for channel demo-cluster3-rootcoord-dml_1_445021135592215568v0: fail to Search, QueryNode ID=10, reason=Timestamp lag too large lag(129h36m35.55s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable

Steps To Reproduce

query error.

Milvus Log

query error.

Anything else?

query error.

weiZhenkun avatar Jan 02 '24 07:01 weiZhenkun

it seems that the tt lag is huge in you milvus cluster @weiZhenkun Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.

/assign @weiZhenkun /unassign

yanliang567 avatar Jan 02 '24 11:01 yanliang567

/assign @congqixia I got the logs offline, please ping me when you need the logs @congqixia

/unassign @weiZhenkun

yanliang567 avatar Jan 02 '24 11:01 yanliang567

Has the problem been solved? I had the same problem.

cuiyuan605 avatar Mar 10 '24 10:03 cuiyuan605

Has the problem been solved? I had the same problem.

There might be many different situation that could trigger this issue. If latest 2.3.11 can not fix that, please offer our detailed logs

xiaofan-luan avatar Mar 10 '24 23:03 xiaofan-luan

I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

lucifax301 avatar May 15 '24 17:05 lucifax301

I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]

We will need more logs qfrom querynode and datanode to understand why this happens. This is a mechanism saying that insert and delete can not be catching up by querynodes or deletenodes. usually it because of too many writes or consumption blocked

xiaofan-luan avatar May 18 '24 09:05 xiaofan-luan

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jun 18 '24 03:06 stale[bot]