vector
vector copied to clipboard
feat(redis source): add reconnecting pubsub session for channel `data_type`
Summary
This PR improves the reliability of the Redis channel data_type.
Previously, if the Redis server restarted or the connection was lost, Vector would stop consuming messages permanently until manually restarted.
This change introduces a session-based model that automatically reconnects, re-subscribes to the configured channel, and resumes message consumption without operator intervention. It also adds shutdown-aware backoff logic, graceful unsubscribe on shutdown, and clearer logs when recovery occurs.
Vector configuration
[sources.redis_sub]
type = "redis"
data_type = "channel"
key = "my-events"
url = "redis://127.0.0.1:6379"
How did you test this PR?
Tested manually on local
Change Type
- [x] Bug fix
- [x] New feature
- [ ] Non-functional (chore, refactoring, docs)
- [ ] Performance
Is this a breaking change?
- [ ] Yes
- [x] No
Does this PR include user facing changes?
- [ ] Yes. Please add a changelog fragment based on our guidelines.
- [x] No. A maintainer will apply the
no-changeloglabel to this PR.
References
Close #22615
Notes
- Please read our Vector contributor resources.
- Do not hesitate to use
@vectordotdev/vectorto reach out to us regarding this PR. - Some CI checks run only after we manually approve them.
- We recommend adding a
pre-pushhook, please see this template. - Alternatively, we recommend running the following locally before pushing to the remote branch:
make fmtmake check-clippy(if there are failures it's possible some of them can be fixed withmake clippy-fix)make test
- We recommend adding a
- After a review is requested, please avoid force pushes to help us review incrementally.
- Feel free to push as many commits as you want. They will be squashed into one before merging.
- For example, you can run
git merge origin masterandgit push.
- If this PR introduces changes Vector dependencies (modifies
Cargo.lock), please runmake build-licensesto regenerate the license inventory and commit the changes (if any). More details here.
Hi @gibranbadrul, thank you for this contribution. Please
- Add changelog
- Capping at 1 second after 4 retries is too aggressive
- Consider adding a metric/internal event for reconnections