inspektor-gadget
inspektor-gadget copied to clipboard
[EPIC] Design the overall solution for GlobalTracer/API
Currently, the interaction between the gadget client and the gadget tracer manager is mostly done through CRDs. This is very simple to use for the current setup, but it makes other setups more complex (e.g. Headlamp plugins, running individual gadgets, coordinating data across nodes).
Instead, we would like to have a solution based on a gRPC API, that can be used without creating CRDs. This issue is about designing this solution, which will probably mean first discussing the overall design, then creating an EPIC and a bunch of individual issues for it.
Whether we should continue with https://github.com/kinvolk/inspektor-gadget/issues/206 and https://github.com/kinvolk/inspektor-gadget/pull/586 depends on the outcome of this discussion.
Before jumping into the design of a new API, I'd like to describe the current implementation.
We define a Trace custom resource. It receives as configuration the node, gadget, filter, output mode and parameters. As result, it provides state, output, operation error and warning. There is a trace controller that runs on each node and handles the creation, update and removal of Trace
resources.
Once the trace resource is created, the client controls it by updating the Operation
annotation, the trace controller is triggered and it calls the right operation on each gadget. The gadget then executes the operation and updates the trace status. The client waits for the right trace state value and then the information contained on the Trace's status is provided to the user. Finally, the trace resources are removed.
The previous description is valid for gadgets that provide output a single time, like snapshot and profile. Gadgets that provide a "stream" of events (audit, top, trace) also use the "kube-exec" api to provide those events to the client.
The following diagram tries to summarize the most important steps:
Using a CRD-based implementation has some pros and cons:
Advantages:
- CRD is a well-known paradigm, users can define their own CRs and use Inspektor Gadget without using kubectl-gadget.
- We leverage RBAC. We also don't have to worry about exposing a service.
Disadvantages:
- It's not possible to support stream gadgets properly: kube-exec or another mechanism is always needed for that kind of gadgets. (https://github.com/kinvolk/inspektor-gadget/issues/372)
- This is because Kubernetes doesn't provide a way to stream events on a custom resource.
- A trace resource can only be used in a single node. A GlobalTrace is needed.
- There is not a proper API exposed. The only way to interact with trace resources is limiting as it's composed only by a resource annotation. The client has to check trace state before reading results. All of this leads to a difficult client implementation, making it also more difficult to integrate with 3rd party apps.
Given these limitations, specially the stream gadgets one, I think we should look for an alternative to the CRD implementation. The proposed solution should be:
- secure: It should provide an authorization/authentication mechanism.
- scalable: It should allow to stream data from multiple gadgets at the same time.
- easy to integrate in 3rd party tools.
It seems to me the obvious solution to this is to go for a gRPC API, but I want first to be sure we'll agree on this before moving to the API details itself.
/cc @alban @blanquicet @eiffel-fl
I think what you wrote summarize what we have discussed so far in the different team meetings. I agree that there are more advantages than disadvantages of moving to a gRPC API so I support the idea of making this change.
BTW, another example where the CRD-based approach makes the client implementation more difficult is for the top
gadgets which is quite complicated at the moment. In general, a gRPC API would be more responsive and it would indeed make the client implementation easier.
I totally agree on using the gRPC API and I think we all agreed about it during the team meetings. The thing we needed to settle was the architecture of the solution but my memories about it are sadly not so clear.
I think that before taking this decision, we need to explain:
- Why gRPC alleviates or fixes the disadvantages you mentioned.
- kube-exec: what is replacing it? In the case of Headlamp and kubectl-gadget: which component would be the gRPC client? Would it be Headlamp/kubectl-gadget connecting to each gadget pod individually or connecting to a yet-to-be-designed intermediary component that would then talk to each gadget pod?
- Removing the need for GlobalTrace: how would that work with gRPC? Where would the network traces be merged and converted to network policies?
- Difficult implementation in the client: why would it be easier with gRPC? Could you give a diagram showing the gRPC methods and how would the CLI / GUI waits for the BPF tracers to be ready on each node?
- What's the impact on the advantages listed, and some not listed
- CRD = well-known paradigm accessible with HTTP/REST on the API server: would Headlamp plugins need to switch to gRPC? If so, is there good Javascript libraries to implement a gRPC client?
- RBAC access: What would replace this?
- GlobalTrace have an advantage not listed: if nodes are created or deleted while a IG trace is running or starting, the GlobalTrace controller can automatically create/delete the Trace resources accordingly, so the CLI does not have to worry about it.
* Why gRPC alleviates or fixes the disadvantages you mentioned. * kube-exec: what is replacing it?
There is not a direct replacing for kube-exec because it's not needed. The data from the gadgets will be "streamed" back to the client through the gRPC interface. It's still not totally clear how to expose this API to the client. One possibility is to create a kubernetes service and expose it (load balancer, ingress, etc), another approach is to use port forwarding.
In the case of Headlamp and kubectl-gadget: which component would be the gRPC client?
Headlamp and kubectl-gadget themselves. We can provide a .proto definition of the interface and different clients can generate the specific language bindings to use it. For golang we could even provide some library that builds on top of the gRPC API.
Would it be Headlamp/kubectl-gadget connecting to each gadget pod individually or connecting to a yet-to-be-designed intermediary component that would then talk to each gadget pod?
This is something that still has to be decided. As you mentioned there are two options: (1) connect to each pod (2) connect to a "central" component. The second option makes it easier to implement the client, as it doesn't have to connect to each pod, however implementing this "intermediary" component can be very difficult as it needs to scale well and can easily become a bottle neck. I'm also wondering if we can delay this discussion, we can go with option 1 for the time being and in a future we could implement option 2 if needed...
- Removing the need for GlobalTrace: how would that work with gRPC? Where would the network traces be merged and converted to network policies?
It depends on the discussion above. With option 1 the client will receive information about traffic on all nodes and will have to merge it together to generate network policies. With option 2 we could provide an API that directly returns the network policies.
* Difficult implementation in the client: why would it be easier with gRPC?
AFAIU the CRD api doesn't have a structured way to make remote calls. I mean, there is not a straightforward mechanism to tell the controller to perform an operation and then get the result. We're implementing it through an annotation on the CR, but it seems to me the resulting logic is complicated as it a single operation involves multiple steps: setting the annotation, waiting for trace state, etc. In grpc, all of this could be a single rpc.
Could you give a diagram showing the gRPC methods and how would the CLI / GUI waits for the BPF tracers to be ready on each node?
I think we can provide a higher level interface where tracer creation / removal is hidden to the client. I implemented a PoC that provides a Trace()
and Snapshot()
methods over grpc. The server side implementation has all the logic to create and remove the tracer and the client only needs to invoke these methods and read the results. For the trace it's a stream with the events, for snapshot it's just a string.
The call flow is something like:
I think other gadgets like top, profile, audit and advise can be implemented in a similar way.
* What's the impact on the advantages listed, and some not listed * CRD = well-known paradigm accessible with HTTP/REST on the API server: would Headlamp plugins need to switch to gRPC?
I think so.
If so, is there good Javascript libraries to implement a gRPC client?
There is https://github.com/grpc/grpc-web. I'm confident enough that it's possible to generate and use the gRPC API from most popular programming languages quite easily.
* RBAC access: What would replace this?
Good point. TBH this is something I haven't gone through all the details. My first consideration is that perhaps we don't need a full RBAC implementation. I was thinking that a token based authenticate could be enough: a token is generated when deploying IG on the cluster and then the client uses this token to make calls.
Do you have any ideas or comments on this aspect?
* GlobalTrace have an advantage not listed: if nodes are created or deleted while a IG trace is running or starting, the GlobalTrace controller can automatically create/delete the Trace resources accordingly, so the CLI does not have to worry about it.
I agree. We could handle this case if we go with option 2 discussed above. However, I'm not sure this is a very common case that we should worry about.
I also had some ideas and thoughts regarding these changes I would like to share with you. I'm still new to the project as you know, so please excuse when I'm a bit off on some of the topics / not use the right terms and feel free to correct me :)
IMHO, there are two main disadvantages of the CR solution as it exists right now:
- lifecycle management: the client has to do all the housekeeping - if you lose the connection somehow while a trace is ongoing, it will stay active afaiu. Maybe it would be good to introduce an (optional) maximum lifetime, after which a trace would be removed by the gadget tracer manager?
- in case of the streaming gadgets: execing into the (privileged) pod only to fetch some specific information seems like a potential security risk that would be avoided when using a dedicated service over gRPC or HTTP.
About the RBAC / Authentication when switching to gRPC:
- why not use the CR as enabler or admission ticket instead of a trace itself, especially in case of streaming gadgets. So when a trace CR gets created, the manager will see it and annotate it with a randomly generated (secure) token that can be used with a gRPC call to fetch the stream. Additionally, the lifecycle of the trace itself can then be controlled via gRPC (as described above by @mauriciovasquezbernal) and removed on close of the connection.
- same could be used for profile/snapshot as well, if the target for the output is not the CR itself but when it's rather used as a longer living admission ticket to get periodic results
Additionally, one could define service accounts/tokens with access to all gRPC calls using a static configuration via secrets. Combined with the RBAC based approach, this should provide enough flexibility for users.
There are other advantages along the way of switching to/adding gRPC:
- as you have to define all the data structures that get transported somehow in protocol buffers (with additional annotations for exporting to json), you have a pretty clean description of all the interfaces all the time
- protocol buffers are much faster than JSON (at least last time I checked) - so especially for high load situations (sigsnoop for example, I guess), this could have a significant impact on CPU usage
- you basically get a REST/websocket interface for free (so you can use IG with curl only, for example)
Especially the last advantage can make IG much more accessible to other use cases / integrations.
Also, regarding a global tracer: when everything is using gRPC, I could imagine that making every gadget tracer manager a "proxy" to also be able to aggregate traces from the other instances would be relatively simple. This would further decrease complexity of the client, be it kubectl-gadget itself or a 3rd party app. The api extension stuff @alban mentioned could be a good choice for that as well, then.
Regarding all of this, I think we are merging two problems:
- How to communicate with
gadgettracermanager
(i.e. using either gRPC or CRD)? - Does Inspektor Gadget scales?
Regarding the CRD, I am quitte happy with it and I would only replace it if it does not offer a good scaling.
Ref https://github.com/inspektor-gadget/inspektor-gadget/pull/1096
Part of this was implemented in https://github.com/inspektor-gadget/inspektor-gadget/pull/1281.
The design is completed and the implementation will be tracked in https://github.com/inspektor-gadget/inspektor-gadget/issues/1409.