nats-server
nats-server copied to clipboard
Allow custom StreamSource heartbeat
- [x] Link to issue, e.g.
Resolves #NNN - [ ] Documentation added (if applicable)
- [ ] Tests added
- [x] Branch rebased on top of current main (
git pull --rebase origin main) - [x] Changes squashed to a single commit (described here)
- [x] Build is green in Travis CI
- [x] You have certified that the contribution is your original work and that you license the work to the project under the Apache 2 license
Main issue: #4094
Changes proposed in this pull request:
- Added
Heartbeatfield toStreamSource - Added
getMirrorHeartbeathelper function - Added
getStreamSourceHeartbeatandgetSourceHeartbeathelper functions - Use
getMirrorHeartbeatto define mirror heartbeat and related health checks - Use
getStreamSourceHeartbeatandgetSourceHeartbeatto define stream sources heartbeat and related health checks
CC @paolobarbolini
Except for the failing CI (for which TBH I am a bit perplexed), feel free to give me some feedback about the value of the PR (and the relative issue). If you think it is basically fine, I will add some tests for the new behavior.
The data race in TestJetStreamClusterSourceWithOptStartTime is legit.
@derekcollison can you give me a hint about a possible cause? The reason I don't understand is that, without specifying any idle_heartbeat parameter, the behavior should have not been changed in any possible way. What I am not seeing?
The stack for the datarace will give you the info needed, in terms of when it violates and what was previous operation, etc..
@derekcollison Thanks for the patient! There was just a single point of data race it slipped through. Tests are now green.
Let me know if the change is reasonable for you, in that case I can add a few tests.
@derekcollison Sorry to bother, but I am banging my head against the tests and I probably need a hint from someone who knows the codebase.
The ideal test would be to use both (relatively) small and large custom heartbeat and monitor the behavior of the consumers in order to check everything is working as expected. For instance, I should expect a failing test if I change back the heartbeat value to the previously default one in the checks for stalled streams.
The issue I am facing is I am unable to verify this situation. For instance, the test TestJetStreamPushConsumerIdleHeartbeats exactly performs this task, but it has direct access to the testing consumer (because it just creates it) in order to subscribe to the replies and check for the heartbeat. On the other hand, when a stream is configured to use a mirror or some sources, the created consumer is not exposed in any way, and I am unable to get the heartbeat messages just subscribing to >.
I honestly do not know if I am missing something obvious or the situation is really a bit tricky, in any case I think people with the knowledge of the codebase could give me very useful information. Even if my changes have a small impact, I think that there are a couple of sharp edges that really need some additional tests.
Thank you in advance!
Yes can be tricky in there, the consumer is gettable, but a bit nuanced.
Can you point me to a line in your test where you would need direct access and I can probably help.
Sure, and thank you for helping me! :blush:
I just pushed a wip commit, which is pretty much stolen from TestJetStreamPushConsumerIdleHeartbeats.
As you can see I am trying to replicate the subscription to the consumer in order to get the heartbeat messages.
However (and pretty obviously), in the original test the subject is used as a delivery subject for the consumer. However, in my case the subject of the consumer is changed a bit and other parameters are set -- and my noobness with the codebase does not let me grasp what is the specific reason I am unable to see the the heartbeat messages from the inbox subscription :sweat_smile:.
I would be really happy to solve this without bothering you, but I think I need some help :disappointed:. Thank you in advance for all the support!
We have a customer that is looking at this functionality so we will introduce for 2.10 in some for or fashion.
Looks like the final bit to remove is the local replace in go.mod?
@bruth Sorry for the delay. I just resynced the repo. Regarding to your last comment, I think that this PR needs a change in nats.go to correctly support the new Heartbeat field before being merged, and the change is pretty simple as you can see from here.
If you want I can open a PR in nats.go in order to add that field and link to this PR, just let me know.
The approach of #4352 looks interesting. Since this didn't make it in 2.10 we're going to patch nats-server in our Yocto build by hard-coding a higher value for heartbeat interval. We'll see whether this option makes sense to be configured globally or at a per stream-by-stream level.