milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Bug]: [laion1b-test] GC has not been triggered for a long time

Open ThreadDao opened this issue 1 year ago • 9 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Environment

- Milvus version: 2.3-20231229-7a192da8-amd64
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka):   pulsar 
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

  1. Concurrent insert, delete, flush, query, search of a collection with 50 million data. The number of segments in the collection increases sharply and then stops increasing new data. laion1b-test-read-write-new-8a

  2. After a period of index building and compaction, the number of segments needed is reduced. But the number of dropped segments has not decreased. I guess GC is blocked by something image

grafana metrics

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

ThreadDao avatar Jan 03 '24 12:01 ThreadDao

@ThreadDao you can try to use the latest 2.3 branch to build a image, it should have been fixed. link pr: https://github.com/milvus-io/milvus/pull/29557

SimFG avatar Jan 03 '24 13:01 SimFG

/assign @ThreadDao please help to verify /unassign

yanliang567 avatar Jan 04 '24 00:01 yanliang567

@SimFG I thought the problem with this issue was that something might be blocking the GC rather than the lack of concurrency in the GC?

ThreadDao avatar Jan 04 '24 03:01 ThreadDao

@ThreadDao This is caused by too many dropped segments, which means the cleaning speed is relatively slow and it blocks the GC

SimFG avatar Jan 04 '24 05:01 SimFG

@ThreadDao @chyezh From the server log, i found it maybe block the recycle unused indexes step. loglink

SimFG avatar Jan 04 '24 12:01 SimFG

GC Task blocks at scan (list) minio object.

goroutine 7744 [chan receive]:
github.com/milvus-io/milvus/internal/storage.(*MinioObjectStorage).ListObjects(0xc0008dc280, {0x555e678, 0xc02db23720}, {0xc000418770, 0xd}, {0xc02d1a6be0, 0x10}, 0x1)
	/go/src/github.com/milvus-io/milvus/internal/storage/minio_object_storage.go:190 +0x525
github.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).listObjects(0xc0008eb530, {0x555e678, 0xc02db23720}, {0xc000418770, 0xd}, {0xc02d1a6be0, 0x10}, 0x37?)
	/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:393 +0xca
github.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).ListWithPrefix(0xc079463768?, {0x555e678?, 0xc02db23720?}, {0xc02d1a6be0?, 0xc02be38f30?}, 0x2c?)
	/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:321 +0x3f
github.com/milvus-io/milvus/internal/datacoord.(*garbageCollector).scan(0xc059518fc0)
	/go/src/github.com/milvus-io/milvus/internal/datacoord/garbage_collector.go:218 +0x39c
github.com/milvus-io/milvus/internal/datacoord.(*garbageCollector).work(0xc059518fc0)
	/go/src/github.com/milvus-io/milvus/internal/datacoord/garbage_collector.go:153 +0xad9
created by github.com/milvus-io/milvus/internal/datacoord.(*garbageCollector).start.func1
	/go/src/github.com/milvus-io/milvus/internal/datacoord/garbage_collector.go:96 +0x6d

chyezh avatar Jan 05 '24 03:01 chyezh

GC Task blocks at scan (list) minio object.

goroutine 7744 [chan receive]:
github.com/milvus-io/milvus/internal/storage.(*MinioObjectStorage).ListObjects(0xc0008dc280, {0x555e678, 0xc02db23720}, {0xc000418770, 0xd}, {0xc02d1a6be0, 0x10}, 0x1)
	/go/src/github.com/milvus-io/milvus/internal/storage/minio_object_storage.go:190 +0x525
github.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).listObjects(0xc0008eb530, {0x555e678, 0xc02db23720}, {0xc000418770, 0xd}, {0xc02d1a6be0, 0x10}, 0x37?)
	/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:393 +0xca
github.com/milvus-io/milvus/internal/storage.(*RemoteChunkManager).ListWithPrefix(0xc079463768?, {0x555e678?, 0xc02db23720?}, {0xc02d1a6be0?, 0xc02be38f30?}, 0x2c?)
	/go/src/github.com/milvus-io/milvus/internal/storage/remote_chunk_manager.go:321 +0x3f
github.com/milvus-io/milvus/internal/datacoord.(*garbageCollector).scan(0xc059518fc0)
	/go/src/github.com/milvus-io/milvus/internal/datacoord/garbage_collector.go:218 +0x39c
github.com/milvus-io/milvus/internal/datacoord.(*garbageCollector).work(0xc059518fc0)
	/go/src/github.com/milvus-io/milvus/internal/datacoord/garbage_collector.go:153 +0xad9
created by github.com/milvus-io/milvus/internal/datacoord.(*garbageCollector).start.func1
	/go/src/github.com/milvus-io/milvus/internal/datacoord/garbage_collector.go:96 +0x6d

List all object of huge bucket in minio cost too much time. We need make a pagination to avoid it.

chyezh avatar Jan 05 '24 04:01 chyezh

Let's take this in the newer fix

xiaofan-luan avatar Jan 05 '24 07:01 xiaofan-luan

New fix has been merged into master. @ThreadDao Please verify it.

chyezh avatar Apr 26 '24 02:04 chyezh