cloud-on-k8s icon indicating copy to clipboard operation
cloud-on-k8s copied to clipboard

feat(helm): implementation of integrations chart

Open pkoutsovasilis opened this issue 1 year ago • 6 comments

I open this draft PR to capture the footprint of the changes required to satisfy https://github.com/elastic/elastic-agent/issues/3848 and kickoff further discussions for https://github.com/elastic/elastic-agent/issues/3847. So please don't merge this PR as nothing is concrete yet. With that said if any comments more than welcomed.

pkoutsovasilis avatar Nov 30 '23 16:11 pkoutsovasilis

💚 CLA has been signed

Overall this is looking great.

joshdover avatar Dec 04 '23 16:12 joshdover

@pkoutsovasilis really nice start here. For me everything worked right away. Adding some general comments here for further discussion:

-CCing https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.yaml as an example. Shall we start using this naming convention and schema even from day1? I know is a big change for now but maybe worths the effort in order to start being compliant with Otel

  • We should add comments to all fields of value.yaml. Not for this POC of course but for later stable releases

  • Adding the idea of schema from https://github.com/open-telemetry/opentelemetry-helm-charts/blob/main/charts/opentelemetry-collector/values.schema.json. This can help us in some initial tests, validations and linting

  • Right now I can not disable kube-state-metrics if I dont want to. Are we thinking of doing so?

  • There are two fields that users change: a) logs path and b) scraping period of metrics. Are those part of this POC?

gizas avatar Dec 05 '23 09:12 gizas

I tried this following the multiple integration example and it worked as expected.

I think it would be nice to have an example of how setup an equivalent set of Fleet managed agents with the separate daemonset and deployment with the initial agent container configuration overridden (e.g. with leader election disabled everywhere). Perhaps this is intended as a follow up, but users regularly ask for this and it would have simplified several recent support cases.

The only trouble I ran into was when experimenting with the es-ref-secret. If you omit the expected values (for example trying to use the not yet implemented api-key instead of password) the operator doesn't create anything and there's no obvious error.

You can figure out what happened by looking at the operator logs, but users coming from the current deployment pattern aren't going to expect this or know to look for it. Is there a way to make these failures more immediately obvious? Or is this something we have to solve with documentation?

For example this is the error message the controller gave me, which was very clear, but you have to know to look at the controller logs to get it.

{"log.level":"error","@timestamp":"2024-04-05T19:23:47.829Z","log.logger":"manager.eck-operator","message":"Reconciler error","service.version":"2.12.1+a56215f7","service.type":"eck","ecs.version":"1.4.0","controller":"agent-es-association-controller","object":{"name":"agent-pernode","namespace":"default"},"namespace":"default","name":"agent-pernode","reconcileID":"5ee3a9cb-cafb-4aaa-af40-d8a2e221711f","error":"username secret key doesn't exist in secret es-ref-secret","errorCauses":[{"error":"username secret key doesn't exist in secret es-ref-secret"}],"error.stack_trace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:329\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227"}

thanks for trying this out @cmacknz. The case you stumbled upon is coming from the operator itself. You are right in terms of plain k8s objects nothing appears to be created but Agent CRDs (this is what we deploy here) should be there and their status should indicate that something is happening but not getting ready. With that said ofc we will fortify this through documentation and have a look around how we could make such cases more "visible"

pkoutsovasilis avatar Apr 09 '24 17:04 pkoutsovasilis

Agent CRDs (this is what we deploy here) should be there and their status should indicate that something is happening but not getting ready.

Yes I actually forgot to check the agent CRDs and was just doing kubectl get pods when working since I'm not used to working with ECK by reflex yet.

cmacknz avatar Apr 09 '24 20:04 cmacknz

builkite test this -f p=gke,t=TestIntegrationsChartDefaultKubernetes,s=8.13.2

thbkrkr avatar Apr 26 '24 07:04 thbkrkr

sorry for keeping this open for so long, but based on some internal discussions it was decided that this Helm chart will find its home in elastic-agent repo. Expect that, ECK operator will still be supported by the "new" helm chart there, but there will be also an option to deploy elastic-agent as plain k8s objects. Thank you for all the help @thbkrkr and @pebrc 🙏

pkoutsovasilis avatar Aug 16 '24 16:08 pkoutsovasilis