valkey icon indicating copy to clipboard operation
valkey copied to clipboard

Fix incorrect lag and entries-read value with tombstone

Open artikell opened this issue 1 year ago • 13 comments

This PR fix three issues(https://github.com/valkey-io/valkey/issues/641):

  • Fixed the incorrect entries-read count in the case of an empty stream.
  • Ensured the real-time accuracy of the entries-read value in XINFO, preventing inconsistencies between lag and entries-read. Consequently, the return results of some unit tests (UT) have been modified.
  • Fixed the counting error when last-id < max_deleted_entry_id < first_id.

artikell avatar Jun 23 '24 18:06 artikell

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 70.87%. Comparing base (ae2d421) to head (f5c2ba3). Report is 804 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #685      +/-   ##
============================================
+ Coverage     70.03%   70.87%   +0.83%     
============================================
  Files           110      123      +13     
  Lines         60076    65929    +5853     
============================================
+ Hits          42076    46726    +4650     
- Misses        18000    19203    +1203     
Files with missing lines Coverage Δ
src/t_stream.c 92.88% <100.00%> (-0.27%) :arrow_down:

... and 115 files with indirect coverage changes

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • :package: JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codecov[bot] avatar Jun 23 '24 18:06 codecov[bot]

Apologies for taking this long to get to this. I'm not super confident in getting all these edge cases fixed in one go. If you don't mind, could you split out the changes into separate PR and target this one to fix only the lag issue described in #641 . @valkey-io/core-team Could one of you with more expertise have a look at it?

Disclaimer: I'm also reading this code flow in detail for the first time.

Got it, thanks for the review. I'll try to split it in the next few days.

artikell avatar Jan 04 '25 12:01 artikell

We can merge #1952 first, backport it and include it in patch releases 7.2.9, 8.0.3 and 8.1.1.

After that, we can merge this PR and include it in 9.0 because it has a small behavior change. Sounds good?

zuiderkwast avatar Apr 16 '25 18:04 zuiderkwast

We can merge #1952 first, backport it and include it in patch releases 7.2.9, 8.0.3 and 8.1.1.

After that, we can merge this PR and include it in 9.0 because it has a small behavior change. Sounds good?

LGTM, We also have to pay attention to max-deleted-entry-id incompatible changes in #1952.

artikell avatar Apr 16 '25 22:04 artikell

@zuiderkwast Can we discuss this in the next weekly meeting to understand the little changes you referred to?

madolson avatar Jul 21 '25 14:07 madolson

@madolson I think is already solved in https://github.com/valkey-io/valkey/pull/1952

nesty92 avatar Jul 21 '25 14:07 nesty92

As far as I remember, there are some differences. Let me revise this PR to facilitate the discussion. If there are no issues, I'll close it.

artikell avatar Jul 22 '25 02:07 artikell

@zuiderkwast Can we discuss this in the next weekly meeting to understand the little changes you referred to?

Not sure if I will be able join. It's about the first bullet in the top comment. The small changes to test cases in this PR illustrate the small behaviour change. The XINFO field entries-read returns a number instead of null, which looks like a correction more than anything.

zuiderkwast avatar Jul 22 '25 15:07 zuiderkwast

As @zuiderkwast mentioned, currently, the entries-read and lag fields of xinfo group are calculated separately.

            if (cg->entries_read != SCG_INVALID_ENTRIES_READ) {
                addReplyLongLong(c, cg->entries_read);
            } else {
                addReplyNull(c);
            }
            addReplyBulkCString(c, "lag");
            streamReplyWithCGLag(c, s, cg);

Theoretically, if these two fields cannot be calculated, they should both be null. However, this is not the case now, which seems to be an anomaly.

We need to discuss whether to amend this logic: ensuring that either both lag and entries-read are null, or both have values.

cc @madolson @nesty92

artikell avatar Aug 04 '25 02:08 artikell

Yeah, this behavior change seems OK to me.

madolson avatar Oct 13 '25 14:10 madolson

@artikell Can you fix the merge conflicts, please?

zuiderkwast avatar Oct 13 '25 20:10 zuiderkwast

@artikell Can you fix the merge conflicts, please?

Okay, I have time to fix this today.

artikell avatar Oct 14 '25 02:10 artikell

As @zuiderkwast mentioned, currently, the entries-read and lag fields of xinfo group are calculated separately.

            if (cg->entries_read != SCG_INVALID_ENTRIES_READ) {
                addReplyLongLong(c, cg->entries_read);
            } else {
                addReplyNull(c);
            }
            addReplyBulkCString(c, "lag");
            streamReplyWithCGLag(c, s, cg);

Theoretically, if these two fields cannot be calculated, they should both be null. However, this is not the case now, which seems to be an anomaly.

We need to discuss whether to amend this logic: ensuring that either both lag and entries-read are null, or both have values.

cc @madolson @nesty92

I don't see the problem with the two having different values, since they measure different things. The read entries will tell you how many items the consumer has read, and the delay will tell you an estimation how much is left to read.

nesty92 avatar Oct 15 '25 10:10 nesty92