tetragon
tetragon copied to clipboard
Remove remaining Cilium dependency from the Tetragon project
Tetragon can run both with and without Cilium on the same node. Some functionality, however, still depends on the Cilium agent being present. Specifically, Tetragon uses Cilium to retrieve the pod information for destination IPs for pods which are not local to the node. The goal of this project is to introduce this functionality on Tetragon. One approach would be for the Tetragon agent to keep information about all pods in the cluster, but this approach does not scale well due to the k8s API server needing to propagate all pod information to all nodes. Instead, the plan to introduce a new custom resource (CR) which is maintained by the Tetragon operator and provides a mapping from IPs to the small subset of pod information that Tetragon needs. The Tetragon operator will monitor pod information and update the resource as needed. Tetragon agents will watch this CR to provide pod information for destination IPs.
please feel free to contact [email protected] (github id: @michi-covalent) if you'd like to get some feedback for your draft proposal before the application deadline.
Hi @sharlns , I am astudent from SEL laboratory of Zhejiang University, familiar with cloud native, kubernetes, docker and Go. I'm learning eBPF and I also participated in a eBPF project. Therefore, I think this project is quite suitable for me. I plan to apply for GSOC 2023 and apply for this project. Could you have more suggestions to help me get started?
Hi @Lan-ce-lot Thanks for your interest in the project. If you want to get started working with the project, I would suggest checking out the getting started guide and some of the good first issues. This project will be worked on whoever is selected as the GSoC mentee
Thanks for your advice @xmulligan.
I came across this issue a few days ago, i am interested in contributing to it. Due to this issue, I started learning about CRDs and operators. As I learn more, I'm understanding the issue and the code better. I am also working on my proposal simultaneously, @xmulligan it would be great help if you could provide some review to my proposal.
Thanks for your interest @prateek041. Unfortunately, we cannot review your proposal beforehand because it would be unfair to the other applicants
@xmulligan , sorry if I caused any confusion. Actually I read it in the official GSoC mentee guide, that an applicant can submit their proposal as a draft as early as possible, which the mentors can review and provide feedback, can also suggest changes if any. This can be done before the deadline of submission.
Here is the link: https://google.github.io/gsocguides/student/writing-a-proposal#submit-a-proposal-early
thanks for the pointer @prateek041! please feel free to send me your draft if you need some feedback on your proposal before the application deadline. i'll add my contact info in the issue description.
As this issue was not selected for GSoC 2023, is tetragon planning to participate in LFX June term ?
@kkourt @michi-covalent
hey @prateek041 👋 yeah that is a possibility. i'll discuss this with @kkourt next week and update this ticket 🙏
That would be great since I was really looking forward to work on it under a mentor. Just a gentle reminder that last date for application is Tue, May 9, 5:00 PM PDT. according to the Official page
@michi-covalent
ok i opened a pull request here https://github.com/cncf/mentoring/pull/957 let's see what happens.
Hello @michi-covalent
My name is Mahesh and i'm really interested working on this project under LFX Summer term. since i have been working with kubernetes and go from quite a bit time now i think this project is perfectly suitable for me. I would definitely appreciate if you list out some resources and IRC channel !
Thanks !
hello 👋 thank you all for your interest in this project. the application page is here: https://mentorship.lfx.linuxfoundation.org/project/659fe584-68e6-46bf-bd13-12653ef60268
if you have any questions, either:
- add a comment in this github issue, or
- post a message in tetragon slack channel: https://cilium.slack.com/archives/C03EV7KJPJ9
apologies we do not have capacity to reply to direct messages / emails 🙏
post a message in tetragon slack channel: https://cilium.slack.com/archives/C03EV7KJPJ9 apologies we do not have capacity to reply to direct messages / emails pray
Thanks for the response, I tried to join the slack channel but it requires an email with @linuxfoundation.org
domain. I emailed you yesterday to see if someone has already submitted a proposal for this project, if not then I just wanted to show you the proposal I am working on before submitting it to the LFX website.
hi @Mo-Fatah 👋
hmm that's strange, it should not require @linuxfoundation.org email to join tetragon slack channel. could you try https://cilium.herokuapp.com/ and see if it works?
there have been multiple proposals to this project. please see https://github.com/cncf/mentoring/discussions/937 for the application timeline 📆
it worked, thank you so much :smile:
Hello @sharlns and @michi-covalent I am interested in learning about this project and want to work on this project under LFX Mentorship also this issue seems like a great starting point for getting started with a contribution to the cilium. Landed here from the LFX Mentorship projects.
Additionally, I was wondering if there is anything else I can do to get started, such as research and learn about the project from the existing documentation.
hi @YashPimple 👋
to learn more about tetragon, you can start with running through use cases in https://github.com/cilium/tetragon/blob/main/README.md. you can find more comprehensive documentation in https://tetragon.cilium.io/docs/.
Hi, @michi-covalent I will definitely check out the use cases in the GitHub repository and explore the comprehensive documentation on the official Tetragon website. It seems like a great resource to dive deeper into understanding Tetragon. Thank you for your help!
@prateek041 please post your high level plan here in terms of how you are approaching this project 🙏
Sure @michi-covalent I am writing it. Just finishing it up. thanks for the heads up 😄
High level overview of the plan
The entire project of building the operator is split into 5 phases
- Writing the PodInfo CRD
- Writing the Operator
- Creating/Updating Helm charts
- Integrating into Tetragon with a feature flag
- Performance testing of another approach that involves querying the K8s API everytime PodInfo is needed.
@michi-covalent
I read more about building operators and learnt new things, I will keeping adding details to the plan as I learn more, I am choosing this approach rather than directly writing the "best approach" solution, for maximizing my Learnings. Here is a little more detail about the implementation.
Create the CRD
the podinfo CRD will contain the information related to the pods that are exclusively necessary for tetragon. This can simply be done by replicating what information cilium endpoints had about pods and storing the exact same information into the PodInfo CRD.
Controller
Controller will use a PodInformer that will have three handlers. Add, Delete and Update. Whenever any changes into the pod occur, the controller will the run the logic of reflecting the change into the custom resource depending whether Add, Delete or Update handler needs to run.
Question: how will I test if the operator works properly
Integration
Just replace the cilium endpoint with the PodInfo CRD but provide the facility of checking if PodInfo CRD is supposed to run (I will check more about it).
Perf testing the second approach
Create a simple client inject the logic of Fetching pod information, then replace the cilium endpoint with this client, this client will be used by the Tetragon pods to directly fetch the pod information when needed.
Question: How do I create the load here ?
please feel free to give feedbacks.
@michi-covalent
Here is what I believe Custom Resource Should look like: api/v1:
type PodInfoSpec struct {
PodIp string `json:"podIP"`
PodMetaData string `json:"podInfo"`
}
type PodInfoMapper struct {
metav1.TypeMeta `json:",inline"`
metav1.ObjectMeta `json:"metadata,omitempty"`
spec PodInfoSpec `json:"spec,omitempty"`
}
type PodIPMapperList struct {
metav1.TypeMeta `json:",inline"`
metav1.ListMeta `json:"metadata,omitempty"`
Items []PodIPMapper `json:"items"`
}
Now the controller watches for the events related to pods (CUD), takes the info out from the req object and reflects the changes into spec of PodInfoMapper (with custom logic).
Next task I am looking up to is, setting up the controller to look for pod related events, I believe I need to make some changes in the SetupWithManager function. Kubebuilder book will help out but still Suggestions are welcome.
Question: I am not sure if the status field is necessary here ? if yes, what would it be used for ?
Overall Feedback is very much appreciated. @michi-covalent
thanks prateek, please go ahead and open a pull request that defines these types. it's easier to get feedback.
Question: I am not sure if the status field is necessary here ? if yes, what would it be used for ?
we don't need the status field for now. it's used to indicate the runtime state of a resource. for example for pods => https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.27/#podstatus-v1-core
@michi-covalent can we close this issue? 🥺
we need to delete these unused packages:
- https://github.com/cilium/tetragon/tree/main/pkg/cilium
- https://github.com/cilium/tetragon/tree/main/pkg/oldhubble
we haven't deleted them yet because there could be downstream projects that depend on these packages.
I am trying to understand when such PodInfo CRD will be used to "retrieve the pod information for destination IPs for pods which are not local to the node" in the current codebase.
I only saw FindPodInfoByIP function is using it but such function is only called in the test file.
ah it appers it never got fully flushed out. Probably for exactly the above question, when/where is the right place to use it. We could/should probably remove the dead code until it has a user. Feel free to push a PR if you want.
I used gomod to check how Tetragon depends on cilium/cilium Go packages. Here we go:
gomod graph --style cluster=full -p 'deps(github.com/cilium/tetragon/**, 1) inter rdeps(github.com/cilium/cilium/**, 1)' > tetragon-cilium.dot && dot -Tpng -o tetragon-cilium.png tetragon-cilium.dot
Currently version of k8s libraries in Tetragon is tied to Cilium version. It would be nice to decouple them. Here are Tetragon's transitive dependencies of k8s libraries via Cilium:
gomod graph --style cluster=full -p 'deps(github.com/cilium/tetragon/**, 1) inter rdeps(github.com/cilium/cilium/**, 1) inter (rdeps(k8s.io/**) + rdeps(sigs.k8s.io/**))' > tetragon-cilium-k8s.dot && dot -Tpng -o tetragon-cilium-k8s.png tetragon-cilium-k8s.dot
To make it clear - these are dependencies in Go code only, not runtime dependencies. We can try to remove some of them in a separate issue, for now I'm just dumping the pictures here.