milvus
milvus copied to clipboard
[Enhancement]: Should not use NumOfRows in segment info.
Is there an existing issue for this?
- [X] I have searched the existing issues
What would you like to be added?
Current NumOfRows in segment info is very important meta, if it is calculated wrongly, then segment can not be loaded.
My suggestion is to add all rows in binlogs to calculate rows in a segment.
because each file is atomiclly maintained and very easy to maintain it correct.
Why is this needed?
simply the logic, make everything consistent
Anything else?
No response
@congqixia @soothing-rain what do you guys think?
@xiaofan-luan NumOfRows
of segment info is used to estimate segment loaded memory usage previously, which is replaced by binlog size. Actually row count is defined by binlog
content and set in segcore call named LoadFieldData
or UpdateSealedSegmentIndex
.
Currently NumOfRows
will not affect the progress of loading segment
@xiaofan-luan
NumOfRows
of segment info is used to estimate segment loaded memory usage previously, which is replaced by binlog size. Actually row count is defined bybinlog
content and set in segcore call namedLoadFieldData
orUpdateSealedSegmentIndex
. CurrentlyNumOfRows
will not affect the progress of loading segment
So IIUC, since the NumOfRows
field is no longer used to estimate segment memory usage, it is safe to replace NumOfRows
with the sum of binlog rows?
@xiaofan-luan I agree having a single source of truth is cleaner and less-error prone. We should do that :)
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/reopen
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/reopen
/assign @wayblink
@soothing-rain: Reopened this issue.
In response to this:
/reopen
/assign @wayblink
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/unassign @wayblink
This turns out to be some we have to do.
My plan is to do this in two phases:
- (P0) Not removing
NumOfRows
meta. But make sure DataCoord always store correct # of rows, and always pass correct segment # of rows to QueryCoord and IndexCoord. To do this, we re-calculate segments' # of rows before saving segmentInfo to Etcd & passing segmentInfo to other services. - (P2) Completely deprecate
NumOfRows
meta.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
/reopen
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Rotten issues close after 30d of inactivity. Reopen the issue with /reopen
.