milvus icon indicating copy to clipboard operation
milvus copied to clipboard

[Enhancement]: Should not use NumOfRows in segment info.

Open xiaofan-luan opened this issue 2 years ago • 13 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

What would you like to be added?

Current NumOfRows in segment info is very important meta, if it is calculated wrongly, then segment can not be loaded.

My suggestion is to add all rows in binlogs to calculate rows in a segment.

because each file is atomiclly maintained and very easy to maintain it correct.

Why is this needed?

simply the logic, make everything consistent

Anything else?

No response

xiaofan-luan avatar Jul 06 '22 14:07 xiaofan-luan

@congqixia @soothing-rain what do you guys think?

xiaofan-luan avatar Jul 06 '22 14:07 xiaofan-luan

@xiaofan-luan NumOfRows of segment info is used to estimate segment loaded memory usage previously, which is replaced by binlog size. Actually row count is defined by binlog content and set in segcore call named LoadFieldData or UpdateSealedSegmentIndex. Currently NumOfRows will not affect the progress of loading segment

congqixia avatar Jul 07 '22 01:07 congqixia

@xiaofan-luan NumOfRows of segment info is used to estimate segment loaded memory usage previously, which is replaced by binlog size. Actually row count is defined by binlog content and set in segcore call named LoadFieldData or UpdateSealedSegmentIndex. Currently NumOfRows will not affect the progress of loading segment

So IIUC, since the NumOfRows field is no longer used to estimate segment memory usage, it is safe to replace NumOfRows with the sum of binlog rows?

@xiaofan-luan I agree having a single source of truth is cleaner and less-error prone. We should do that :)

soothing-rain avatar Jul 07 '22 02:07 soothing-rain

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 06 '22 03:08 stale[bot]

/reopen

soothing-rain avatar Aug 06 '22 16:08 soothing-rain

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 09 '22 22:09 stale[bot]

/reopen

/assign @wayblink

soothing-rain avatar Nov 07 '22 11:11 soothing-rain

@soothing-rain: Reopened this issue.

In response to this:

/reopen

/assign @wayblink

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sre-ci-robot avatar Nov 07 '22 11:11 sre-ci-robot

/unassign @wayblink

soothing-rain avatar Nov 24 '22 11:11 soothing-rain

This turns out to be some we have to do.

My plan is to do this in two phases:

  1. (P0) Not removing NumOfRows meta. But make sure DataCoord always store correct # of rows, and always pass correct segment # of rows to QueryCoord and IndexCoord. To do this, we re-calculate segments' # of rows before saving segmentInfo to Etcd & passing segmentInfo to other services.
  2. (P2) Completely deprecate NumOfRows meta.

soothing-rain avatar Nov 24 '22 11:11 soothing-rain

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Jan 23 '23 04:01 stale[bot]

/reopen

congqixia avatar Jan 30 '23 02:01 congqixia

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Mar 17 '23 04:03 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Aug 02 '23 05:08 stale[bot]

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. Rotten issues close after 30d of inactivity. Reopen the issue with /reopen.

stale[bot] avatar Sep 03 '23 18:09 stale[bot]