opentelemetry-android icon indicating copy to clipboard operation
opentelemetry-android copied to clipboard

Add ability to throttle exports when reading from disk.

Open Victorsesan opened this issue 1 year ago • 11 comments

Added an implementation which provides a flexible way to manage bandwidth usage when exporting spans, allowing for smoother data flow and preventing resource hogging. It can further refine the size estimation logic based on a specific use case. Relate to #638

Victorsesan avatar Oct 27 '24 02:10 Victorsesan

@Victorsesan are you able to come back to this any time soon? Thanks!

breedx-splk avatar Jan 21 '25 16:01 breedx-splk

Hey @breedx-splk Yes i will, i think i lastly made a change that needed a mod review. Still waiting

Victorsesan avatar Jan 21 '25 18:01 Victorsesan

  • https://docs.github.com/repositories/configuring-branches-and-merges-in-your-repository/managing-protected-branches/about-protected-branches

https://github.com/open-telemetry/opentelemetry-android/pull/663#discussion_r1941464425 left and ready to go, I will approve meanwhile.

marandaneto avatar Feb 04 '25 16:02 marandaneto

@Victorsesan seems like we're close, but the build is broken again.

breedx-splk avatar Apr 14 '25 23:04 breedx-splk

@Victorsesan, let us know if you could fix CI/rebase as well. Otherwise, @bidetofevil will 'hijack' in good faith and get it mergeable.

marandaneto avatar May 27 '25 15:05 marandaneto

So I had a look at the PR, and I think it needs a few additional changes to be production ready: namely, the algorithm to determine the size of a span in bytes is just a placeholder, and when the threshold is reached, the exported spans are not cached, but simply dropped and not passed onto the delegate.

If it was an in-progress change, it may be reasonable to merge it, but unless there is commit to get this production-ready, I don't think we should be in the repo.

bidetofevil avatar May 27 '25 19:05 bidetofevil

So I had a look at the PR, and I think it needs a few additional changes to be production ready: namely, the algorithm to determine the size of a span in bytes is just a placeholder, and when the threshold is reached, the exported spans are not cached, but simply dropped and not passed onto the delegate.

If it was an in-progress change, it may be reasonable to merge it, but unless there is commit to get this production-ready, I don't think we should be in the repo.

I agree. Also, to add to those points:

  • The algorithm to determine the size of a span might not be straightforward to create, and even if we come up with a nice one, it might not be as processing-friendly as other options, such as the one about using a batch/time approach that's mentioned in the issue.
  • Dropping data should not be part of this solution. The closest I think we can get to an implementation that addresses this issue without dropping data, would be by somehow breaking this loop before all the available data in disk is exported.

LikeTheSalad avatar May 28 '25 10:05 LikeTheSalad

Thank you for creating this PR, @Victorsesan. The approach proposed here to solve the issue brings some important concerns, mentioned in the latest comments, that don't make it feasible to get merged, unless we change the overall approach.

Going with a different approach would most likely require discarding all the existing changes in this PR, which is totally understandable if that’s more work than you planned for. So please let us know if you’re up for spending more time on it — if not, no worries, we can close this one and revisit it in a future PR.

Hi @LikeTheSalad i can give another go, since the PR has been opened for too long i will be happy to have it completed regardless

Victorsesan avatar May 28 '25 11:05 Victorsesan

Hi @LikeTheSalad i can give another go, since the PR has been opened for too long i will be happy to have it completed regardless

Got it, thank you @Victorsesan. If I understood correctly, it seems like you would like to try a different approach within this same PR, if that's the case then I'll keep it open. Cheers!

LikeTheSalad avatar May 28 '25 12:05 LikeTheSalad

A couple of suggestions that I think might simplify the solution:

  1. We can approach this from the read-from-disk side of the house
  • Basically, replace the timed job mechanism to export batches, but rather have the read-side be triggered on-demand and read read from disk when it's ready. Basically, when a batch is written to disk, it'll inform the reader that there is a batch ready to go. The reader can decide if it's ready to process it, and do when when it's ready. Once it it exports a batch, it can schedule itself to determine when they should check next, and so on, until there are no more batches to read. The reader will be triggered again when a new batch is written to disk.
  • The advantage of this is that you won't read from disk until you're ready to send, thereby limiting losing data when there's a crash during export, say, if you were using a buffering exporter to send data out.
  1. Instead of counting by bytes, just count by spans. And instead of cutting it off right at the limit, just let the last batch go through.
  • We are just approximating things here to limit data flow, so there's no need to eat the complexity to try to be that fine grain. I think doing so by counting logs and spans is sufficient for most cases, even if not entirely accurate.

bidetofevil avatar Jun 03 '25 15:06 bidetofevil

Thanks for the suggestions @bidetofevil will keep that in mind while working on it

Victorsesan avatar Jun 03 '25 16:06 Victorsesan

This has been automatically marked as stale because it has been marked as needing author feedback and has not had any activity for 21 days. It will be closed automatically if there is no response from the author within 14 additional days from this comment.

github-actions[bot] avatar Dec 09 '25 16:12 github-actions[bot]