[WIP] feat: OpenTelemetry module integration
What this PR does / why we need it:
Types of changes
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] CVE Report (Scanner found CVE and adding report)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] Documentation only
Which issue/s this PR fixes
#9016 step 2
How Has This Been Tested?
Checklist:
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly.
- [ ] I've read the CONTRIBUTION guide
- [ ] I have added unit and/or e2e tests to cover my changes.
- [ ] All new and existing tests passed.
- [ ] Added Release Notes.
Does my pull request need a release note?
Any user-visible or operator-visible change qualifies for a release note. This could be a:
- CLI change
- API change
- UI change
- configuration schema change
- behavioral change
- change in non-functional attributes such as efficiency or availability, availability of a new platform
- a warning about a deprecation
- fix of a previous Known Issue
- fix of a vulnerability (CVE)
No release notes are required for changes to the following:
- Tests
- Build infrastructure
- Fixes for unreleased bugs
For more tips on writing good release notes, check out the Release Notes Handbook
PLACE RELEASE NOTES HERE
Hi @esigo. Thanks for your PR.
I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.
Once the patch is verified, the new status will be reflected by the ok-to-test label.
I understand the commands that are listed here.
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.
/kind feature
/ok-to-test
@esigo , real plesure to see this coming along.
Thanks for the info posted here. But wishful thinking (don't know how much it helps yet) is post the entire sequence of steps, including collector or whatever you are using in your fork. Intention being someone may want to clone your branch and do make dev-env to experience this. At least some feedback may come out of that.
/ok-to-test
@esigo , real plesure to see this coming along.
Thanks for the info posted here. But wishful thinking (don't know how much it helps yet) is post the entire sequence of steps, including collector or whatever you are using in your fork. Intention being someone may want to clone your branch and do
make dev-envto experience this. At least some feedback may come out of that.
@longwuyuan, I updated the description. Please let me know anything is not clear. We could also arrange a session to walk through the example, if that helps.
added documentation here.
@esigo this is awesome. Next step is I request you create a new issue and state that the docs in this PR, need to be worked on in the context of going from a what is it currently to a more "normal worldly" set of instructions. Like copy this .... paste tht ... type that .... etc. Sans any make commands. etc etc
@rikatz , this is ready to test so PTAL
/triage accepted
[APPROVALNOTIFIER] This PR is NOT APPROVED
This pull-request has been approved by: esigo
Once this PR has been reviewed and has the lgtm label, please ask for approval from rikatz by writing /assign @rikatz in a comment. For more information see the Kubernetes Code Review Process.
The full list of commands accepted by this bot can be found here.
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment
Hi @esigo , this is brilliant work and I can’t wait to have nginx otel on an EKS cluster!
Trying to test this, hit a couple of issues, and maybe this feedback can help/ maybe I’m doing something weird.
First, there is one change in setup: I’m running this locally on a minikube cluster (with minikube tunnel running so that the ingress-nginx-controller can acquire an external IP).
I ran verbatim https://github.com/esigo/ingress-nginx/blob/otel-9016-3/docs/user-guide/third-party-addons/opentelemetry.md using the content of otel-9016-3 branch in esigo/ingress-nginx:
- I modified
charts/ingress-nginx/values.yamlto define the extraModules name and image, then applied the helm charthelm install -f charts/ingress-nginx/values.yaml ingress-nginx charts/ingress-nginx - When installing the above helm chart, the ingress-nginx-controller was created in the
defaultnamespace rather than thekube-systemnamespace. So, I had to apply this configmap to the default namespace (same command you specify, only with s/kube-system/default - All done precisely as you state here. Note that a couple of lines require waiting for pods to be healthy (cert-manager takes a bit of time to get going) before proceeding to the next line
- Here I had to run
$(minikube docker-env)first, so that images built locally are available within the minikube cluster. Thenmake images && make deploy-appworked great. (With kind instead of minikube, not necessary.) - Here, no port forwarding was required (
minikube tunnel), and the curl worked precisely as described. Nice demo app! - …
- Skip straight to Jaeger UI: I see beautiful spans from microapp-service and microapp-service-b. But the nginx span is missing
- Same in Zipkin: we see spans from microapp-service and microapp-service-b, but nothing from nginx
So, we have a nice minikube cluster with your demo app running and passing telemetry to an otel collector, and we have cool observability backends (verified Jaeger and Zipkin) making the otel viewable.
But I don’t yet have nginx otel. What bit of it isn’t working?
There are no opentelemetry logs in the ingress-nginx-controller pod:
$ kubectl logs ingress-nginx-controller-7fdf6f78df-h4dt6 -c opentelemetry
$
The controller logs don't include any mention of opentelemetry (despite configmap settings):
$ kubectl logs ingress-nginx-controller-7fdf6f78df-h4dt6
Defaulted container "controller" out of: controller, opentelemetry (init)
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.4.0
Build: 50be2bf95fd1ef480420e2aa1d6c5c7c138c95ea
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.10
-------------------------------------------------------------------------------
...
I guess a custom local controller image build (including the changes in this module integration PR) is necessary here?
Small update:
- where I followed the ingress-nginx dev guide (so, local build of the controller image, all running on a kind cluster), everything worked exactly as documented
- copying across the local build of the controller image, everything worked nicely locally in minikube too
- I of course expect this to work flawlessly IRL and I'm proceeding to push out this custom nginx ingress controller to a test EKS cluster
Thank you @esigo ! This is awesome and I can't wait for this integration to be merged!
@shaundaley39 thanks for testing and your kind feedback. I use k3s for testing, you may check my test setup here. There will be some changes due to #9286 and #9308. I will rebase and apply new changes when the image is promoted https://github.com/kubernetes/k8s.io/pull/4526. The new way of enabling opentelemetry will be via values: https://github.com/kubernetes/ingress-nginx/blob/2cb3ce5db68c4d3f1022068527e55c1018dccf9d/charts/ingress-nginx/values.yaml#L594
Is there an update on what needs to be done to get this into production? I would love to help out if it would be something I can help with.
Is there an update on what needs to be done to get this into production? I would love to help out if it would be something I can help with.
Hi @thomaschaaf, Thanks for your interest. Just review from @rikatz is missing ;)
@esigo is it possible to request review by some one else? @rikatz seems not to have been active on github in the last 2 weeks. (according to the activity report on his account)
I found https://github.com/kubernetes/ingress-nginx/issues/9496#issuecomment-1418449871 which is indicating he is currently on holiday.
@thomaschaaf I asked in the slack channel.
+1, Waiting for that...
+1, waiting this :)
Hey @esigo , This is great work. So i was tracing ingress-nginx with opentracing and stumbled upon this guide. Instrument nginx with OpenTelemetry. Following this i could see tracing done at nginx module level but following the steps you have mentioned, i get a single span at nginx(no internal nginx module tracing info).
Am i doing something wrong, or the guide i mentioned and the work you are doing are completely different. Or is their any way of enabling module level tracing as well.
Hi @Dipenbhatt03, We are using the other implementation. You probably need to check the documentation here https://github.com/kubernetes/ingress-nginx/pull/9144. I didn't get the question. You could see the spans for the other only when they are instrumented too.
Hello @esigo, do you know when this could be merged and released? Regards!!!
Hello @esigo, do you know when this could be merged and released? Regards!!!
Hi @csepulveda I'll demo this in the next SIG meeting (next Thursday). It should be merged soon after that :)
status:
I demoed this in the last SIG meeting (16.03.2023). Expected to be released with 1.7.0.
/hold
@strongjz merged documentation from #9144 (after addressing the review comments there).
/label tide/merge-method-squash
/lgtm
/unhold /lgtm /approve
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: esigo, strongjz
The full list of commands accepted by this bot can be found here.
The pull request process is described here
- ~~OWNERS~~ [strongjz]
Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment