opentelemetry-collector-contrib
opentelemetry-collector-contrib copied to clipboard
feat (processor/k8sattributes): wait for synced when starting k8sattributes processor.
Description: <Describe what has changed.>
When starting k8sattributes processor, block until an initial synchronization has been completed. This solves #32556
Link to tracking Issue: <Issue number if applicable>
#32556
Testing: <Describe what testing was performed and which tests were added.>
Tested in a cluster with constant high span traffic, no more spans with unassociated k8s metadata after adding this pr.
Documentation: <Describe the documentation added.>
This PR was marked stale due to lack of activity. It will be closed in 14 days.
@h0cheung Hi there, I was wondering if you have the time to finalize and merge the pull request. Looking forward to your update. Thanks!
Thanks for your patience!
I have an advice about the two new options
start_timeoutanderror_when_timeout.The reason for this enhancement is to make sure the k8sattributes processor will start consuming the incoming traces or metrics until the initial synchronization are completed, otherwise, some traces/metrics/logs may be unassociated with k8s metadata.
But I think some users may care more about the start rate of collector than the completeness of the k8s metadata.
So how about we change this two fields to
force_wait_for_cache_synced booldefault is falsewait_for_synced_timeout time.DurationThe users who care more about the completeness could set
force_wait_for_cache_synced boolto true and if the synced is still not completed after timeoutwait_for_synced_timeout, the collector should directly return error instead of running, since the completeness could not be guaranteed.@h0cheung @TylerHelmuth
I agree. What are your thoughts? @TylerHelmuth
@TylerHelmuth can you see, plz?
@fatsheep9146 I like that idea. I suggest adjusting naming it wait_for_metadata and wait_for_metadata_timeout to hide some of the details from the end user. I believe we use the term metadata already to refer to the pod data that is gathered and held.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
This PR was marked stale due to lack of activity. It will be closed in 14 days.
remove stale
@fatsheep9146 I like that idea. I suggest adjusting naming it
wait_for_metadataandwait_for_metadata_timeoutto hide some of the details from the end user. I believe we use the termmetadataalready to refer to the pod data that is gathered and held.
Thanks. These two configuration items look simple and clear. I think we can use them.
@h0cheung Hi, could you take some time to work out the code and merge the pull request? I’d really appreciate your update on this. Thanks a lot
@fatsheep9146, @TylerHelmuth please take a look
@h0cheung sorry for delay response,could you solve the conflict?
This PR was marked stale due to lack of activity. It will be closed in 14 days.
@h0cheung Hi there! I was wondering if you could possibly take a little more time to merge the feature? I’d really appreciate it!
Hello @h0cheung,
We wanted to thank you for your work on this pull request. It is very important to us because the related bug significantly affects our work. If you need any assistance or information to speed up the process, please let us know. We appreciate your efforts and look forward to its completion.
Thank you!
Updated and skipped the lifecycle tests because of failing without KUBERNETES_SERVICE_HOST.
How does prometheus-compliance-tests work? Is its failure caused by this pr?
It seems like prometheus-compliance-tests are flaky on the main as mentioned in the issue
https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/35119
Seems like flaky tests were fixed as mentioned in https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/35119
Therefore, @h0cheung could you please rebase one more time?
@TylerHelmuth pls take a look
@h0cheung As I see in the tests we need to run go mod tidy to fix the build. Could you please do it when you have time?
@h0cheung Hi!
It would be wonderful if you have time to do go mod tidy and push changes to the branch.
@TylerHelmuth pls take a look