milvus
milvus copied to clipboard
[Bug]: Query failed: Timestamp lag too large lag(129h36m35.55s) max(24h0m0s)
Is there an existing issue for this?
- [X] I have searched the existing issues
Environment
- Milvus version: 2.3.1
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka): extenal kafka
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.1
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
query error.
Expected Behavior
query error: failed to search/query delegator 10 for channel demo-cluster3-rootcoord-dml_1_445021135592215568v0: fail to Search, QueryNode ID=10, reason=Timestamp lag too large lag(129h36m35.55s) max(24h0m0s): attempt #1: no available shard delegator found: service unavailable
Steps To Reproduce
query error.
Milvus Log
query error.
Anything else?
query error.
it seems that the tt lag is huge in you milvus cluster @weiZhenkun Could you please refer this doc to export the whole Milvus logs for investigation? For Milvus installed with docker-compose, you can use docker-compose logs > milvus.log to export the logs.
/assign @weiZhenkun /unassign
/assign @congqixia I got the logs offline, please ping me when you need the logs @congqixia
/unassign @weiZhenkun
Has the problem been solved? I had the same problem.
Has the problem been solved? I had the same problem.
There might be many different situation that could trigger this issue. If latest 2.3.11 can not fix that, please offer our detailed logs
I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below
[2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]
I also meet this problem , version 2.3.4 or 2.4.1 ,detail logs below
[2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/task_scheduler.go:469] ["Failed to execute task: "] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"] [2024/05/16 01:02:42.927 +08:00] [WARN] [proxy/impl.go:2919] ["Search failed to WaitToFinish"] [traceID=40745a76672aa98cf2ef90ded6bfd6bb] [role=proxy] [db=default] [collection=zzp] [partitions="[KB3ebe25f7f0eb4401b74a6f5fba54759e,KBb39d6898ef1042f1a7d69b69ae342d5c,KBba5023df84fb4033849b62b6d331d794]"] [dsl=] [len(PlaceholderGroup)=3084] [OutputFields="[chunk_id,file_id,file_name,file_path,timestamp,content]"] [search_params="[{\"key\":\"topk\",\"value\":\"100\"},{\"key\":\"params\",\"value\":\"{\\\"nprobe\\\":256}\"},{\"key\":\"round_decimal\",\"value\":\"-1\"},{\"key\":\"ignore_growing\",\"value\":\"False\"},{\"key\":\"metric_type\",\"value\":\"L2\"},{\"key\":\"anns_field\",\"value\":\"embedding\"}]"] [guarantee_timestamp=1] [nq=1] [error="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large"] [errorVerbose="failed to search: failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0: fail to search on QueryNode 63: Timestamp lag too large\n(1) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).Execute\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:512\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).processTask\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:466\n | github.com/milvus-io/milvus/internal/proxy.(*taskScheduler).queryLoop.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_scheduler.go:545\n | github.com/milvus-io/milvus/pkg/util/conc.(*Pool[...]).Submit.func1\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/conc/pool.go:81\n | github.com/panjf2000/ants/v2.(*goWorker).run.func1\n | \t/go/pkg/mod/github.com/panjf2000/ants/[email protected]/worker.go:67\nWraps: (2) failed to search\nWraps: (3) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:188\n | [...repeated from below...]\nWraps: (4) failed to search/query delegator 63 for channel by-dev-rootcoord-dml_0_449187292454152686v0\nWraps: (5) attached stack trace\n -- stack trace:\n | github.com/milvus-io/milvus/internal/proxy.(*searchTask).searchShard\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/task_search.go:696\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry.func1\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:180\n | github.com/milvus-io/milvus/pkg/util/retry.Do\n | \t/go/src/github.com/milvus-io/milvus/pkg/util/retry/retry.go:44\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).ExecuteWithRetry\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:154\n | github.com/milvus-io/milvus/internal/proxy.(*LBPolicyImpl).Execute.func2\n | \t/go/src/github.com/milvus-io/milvus/internal/proxy/lb_policy.go:218\n | golang.org/x/sync/errgroup.(*Group).Go.func1\n | \t/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:75\n | runtime.goexit\n | \t/usr/local/go/src/runtime/asm_amd64.s:1598\nWraps: (6) fail to search on QueryNode 63\nWraps: (7) Timestamp lag too large\nError types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) merr.milvusError"]
We will need more logs qfrom querynode and datanode to understand why this happens. This is a mechanism saying that insert and delete can not be catching up by querynodes or deletenodes. usually it because of too many writes or consumption blocked
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.