vector icon indicating copy to clipboard operation
vector copied to clipboard

feat(deps): update `tower` crate to v0.5.2 and fix partial retry logic

Open Serendo opened this issue 9 months ago • 17 comments
trafficstars

Summary

  1. Update the tower crate to 0.5 so that the retry function now takes a mutable reference of the request. which will be used to modify the original request when doing partial retries. As tower 0.5 has some breaking changes. Some args in the retry part are changed to mutable. Some service types are changed according to the order changed in the tower Buffer.
  2. Add a RetryPartial variant to RetryAction. This variant takes a closure which has the information of the response and will be used to modify the request so only sending the failed part of it.
  3. Implemented the RetryPartialFunction for elasticsearch sink, now that request_retry_partial will only send those previously failed requests. those successful requests will not be sent again.

Change Type

  • [ ] Bug fix
  • [x] New feature
  • [ ] Non-functional (chore, refactoring, docs)
  • [ ] Performance

Is this a breaking change?

  • [ ] Yes. This change will update tower to 0.5 which itself has breaking changes. but it should be transparent to users.
  • [x] No

How did you test this PR?

I wrote a python server mocking the elasticsearch bulk api. which will give a response in which the first 5 documents are 201 created, and others are 429. I can see that a perticular document will only get one 201.

Does this PR include user facing changes?

  • [x] Yes. Please add a changelog fragment based on our guidelines.
  • [ ] No. A maintainer will apply the "no-changelog" label to this PR.

Checklist

  • [x] Please read our Vector contributor resources.
    • make check-all is a good command to run locally. This check is defined here. Some of these checks might not be relevant to your PR. For Rust changes, at the very least you should run:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • [x] If this PR introduces changes Vector dependencies (modifies Cargo.lock), please run dd-rust-license-tool write to regenerate the license inventory and commit the changes (if any). More details here.

References

Closes: #140

#14891

Serendo avatar Feb 13 '25 09:02 Serendo

CLA assistant check
All committers have signed the CLA.

bits-bot avatar Feb 13 '25 09:02 bits-bot

Hi @Serendo, thank you for this PR! I think this is not a breaking change (from the perspective of Vector users). And also we will need a changelog to explain the fix.

pront avatar Feb 13 '25 17:02 pront

Hi @Serendo, thank you for this PR! I think this is not a breaking change (from the perspective of Vector users). And also we will need a changelog to explain the fix.

I have added a changlog fragment for this.

Serendo avatar Feb 14 '25 02:02 Serendo

Thank you for this @Serendo - excited to see it land in the next release!

tronboto avatar Feb 24 '25 08:02 tronboto

Hey @pront . What's still needed here to get this one across the line? Would be great to see it in the next release.

tronboto avatar Mar 11 '25 11:03 tronboto

Hey @pront . What's still needed here to get this one across the line? Would be great to see it in the next release.

Hi @Serendo and @tronboto, I am taking a look.

pront avatar Mar 11 '25 13:03 pront

Hi @Serendo, are you still working on this? I see you asked for a review last week but there are still outstanding review comments to be addressed. It would be great to see this feature land so if you need help getting this PR finished my team could help (I work on OSS @G-Research, @tronboto is a colleague of mine).

adamreeve avatar Apr 14 '25 22:04 adamreeve

Hi @Serendo, are you still working on this? I see you asked for a review last week but there are still outstanding review comments to be addressed. It would be great to see this feature land so if you need help getting this PR finished my team could help (I work on OSS @G-Research, @tronboto is a colleague of mine).

Hi @adamreeve, I forgot to push the commits and I thought I had pushed them. I'll push them later.

And yes it will be great to get some help from you. Thank you! As you see when modifying the original request, there is a metadata part which is usually generated from the events. But when filtering the requests, I can't use that part of the code as the requests are not in the shape of events now.

Serendo avatar Apr 15 '25 12:04 Serendo

Will review again once the merge conflicts are resolved.

pront avatar Apr 15 '25 15:04 pront

And yes it will be great to get some help from you. Thank you! As you see when modifying the original request, there is a metadata part which is usually generated from the events. But when filtering the requests, I can't use that part of the code as the requests are not in the shape of events now.

I had a brief look into this and agree it looks like a difficult problem to solve. I'm not familiar with how this metadata is used and how critical it is that all the metadata is exactly correct. Maybe it's enough to do a best-effort attempt to set the metadata fields where possible and leave some fields as over-estimates? It looks like at least event_count, request_encoded_size and request_wire_size should be possible to be set.

If I've understood correctly, it looks like this feature relies on no compression being used, otherwise std::str::from_utf8(req.payload.as_ref()) will likely fail. It seems like handling compression and building the correct metadata would both be a lot easier if you had access to the original events rather than only the request payload. Would it be feasible to store a copy of the original events in vector::sinks::elasticsearch::request_builder::Metadata and vector::sinks::elasticsearch::service::ElasticsearchRequest?

adamreeve avatar Apr 16 '25 03:04 adamreeve

And yes it will be great to get some help from you. Thank you! As you see when modifying the original request, there is a metadata part which is usually generated from the events. But when filtering the requests, I can't use that part of the code as the requests are not in the shape of events now.

I had a brief look into this and agree it looks like a difficult problem to solve. I'm not familiar with how this metadata is used and how critical it is that all the metadata is exactly correct. Maybe it's enough to do a best-effort attempt to set the metadata fields where possible and leave some fields as over-estimates? It looks like at least event_count, request_encoded_size and request_wire_size should be possible to be set.

If I've understood correctly, it looks like this feature relies on no compression being used, otherwise std::str::from_utf8(req.payload.as_ref()) will likely fail. It seems like handling compression and building the correct metadata would both be a lot easier if you had access to the original events rather than only the request payload. Would it be feasible to store a copy of the original events in vector::sinks::elasticsearch::request_builder::Metadata and vector::sinks::elasticsearch::service::ElasticsearchRequest?

That's a good idea. I'll try to put original events in the request and use that to build the request instead of using the payload.

Serendo avatar Apr 16 '25 12:04 Serendo

Hi @pront , can you do another review?

Serendo avatar May 14 '25 02:05 Serendo

Hi @pront, sorry to contact, but could you review this? It's a crucial functionality for us as we're running ElasticSearch with Vector :pray:

adezxc avatar Jun 05 '25 07:06 adezxc

Hi @Serendo and @adezxc, I was away for a couple of weeks. This is PR is non-trivial to review as is but we are motivated to review it carefully, @bruceg had an interesting idea: Could we split this PR so that the tower update is isolated from the behavior changes?

pront avatar Jun 09 '25 18:06 pront

Hi @Serendo and @adezxc, I was away for a couple of weeks. This is PR is non-trivial to review as is but we are motivated to review it carefully, @bruceg had an interesting idea: Could we split this PR so that the tower update is isolated from the behavior changes?

Hi @pront , the first commit in this PR is the one for the tower upgrade. It hould be OK to be splitted but I don't know how to split a PR. Can you give a hint on this?

Serendo avatar Jun 10 '25 08:06 Serendo

I think it's okay to create a new PR with only the tower upgrade, citing it as a pre-requisite for this one and it should be fine?

adezxc avatar Jun 10 '25 10:06 adezxc

Hi @Serendo and @adezxc, I was away for a couple of weeks. This is PR is non-trivial to review as is but we are motivated to review it carefully, @bruceg had an interesting idea: Could we split this PR so that the tower update is isolated from the behavior changes?

Hi @pront , the first commit in this PR is the one for the tower upgrade. It hould be OK to be splitted but I don't know how to split a PR. Can you give a hint on this?

Hello, you can create a new branch locally and git cherry-pick 287571c78fe3c92f2d1e612b6f3f7599fca230ed and create a new PR.

Later we can git merge origin master to merge it to this PR.

pront avatar Jun 10 '25 13:06 pront

Hi @Serendo, to help us review this, let me know the following please:

  1. after the rebase, are the old review comments still relevant? were they all addressed?
  2. update the PR description
  3. update the test plan

pront avatar Jun 25 '25 13:06 pront

Hi @Serendo, to help us review this, let me know the following please:

  1. after the rebase, are the old review comments still relevant? were they all addressed?
  2. update the PR description
  3. update the test plan
  1. Yes, the retry logic is the same as before.
  2. Updated
  3. This is unchanged.

Serendo avatar Jun 27 '25 02:06 Serendo

There a few open comments, will come back to this when I see new commits.

pront avatar Jul 10 '25 16:07 pront

Please take a look at the merge conflicts. We will take a look after those are fixed.

pront avatar Jul 30 '25 20:07 pront

Thanks for all the work on this 😄

tronboto avatar Aug 08 '25 11:08 tronboto