telepresence icon indicating copy to clipboard operation
telepresence copied to clipboard

[v2] Add a way to stop the main container

Open LukeShu opened this issue 3 years ago • 13 comments

Some workloads (such as controllers) don't respond to incoming traffic, but instead are oriented around a watch to the Kubernetes apiserver. For these workloads, you want to actually stop the main container and only run the copy on the laptop.

This use-case is well-served by Telepresence 1 swap-deployment.

As I write this, an idea occurs to me: It might be worth it to override the main pod's KUBERNETES_SERVICE_HOST and KUBERNETES_SERVICE_PORT env-vars and point them at our own apiserver proxy, this way the main container could keep running but we can stop feeding it events while an intercept is active. However, this only addresses the workloads that get their input from the apiserver; it wouldn't address controllers that get their input from elsewhere (perhaps from Consul).

LukeShu avatar Apr 07 '21 15:04 LukeShu

Perhaps this is also the solution needed to implement the T1 option --from-pod <port>? Not sure what happens if we simply override ports that the main container is listening to though. I think it must be stopped to avoid conflicts.

thallgren avatar May 06 '21 12:05 thallgren

Relates to #1708.

thallgren avatar May 06 '21 12:05 thallgren

I have the same situation where the container I am debugging does things that are not triggered by incoming traffic, but from timers or other events. I would need this to stop, so it doesn't interfere with my local debugging version. In V1, swap-deployment was ideal.

jamesbattersby avatar Jun 25 '21 12:06 jamesbattersby

This is a good feature I used with telepresence 1 and miss in the new version. In my use case I have an application that pulls data from DB or other 3rd parties APIs, so just overriding the default Kubernetes env vars won't help, but a most complete solution required I guess.

olegsu avatar Jun 29 '21 04:06 olegsu

Hey, I see this comment from #1708 saying that this was released. In fact in the case that your application does not expose service this is not working. Do you guys have a plan to support this at some point? I love working with telepresence 2 but cant really continue to work with both version :-(

olegsu avatar Aug 18 '21 10:08 olegsu

@olegsu not sure I understand. #1708 is about the --to-pod flag. Not about stopping the main container.

thallgren avatar Aug 18 '21 13:08 thallgren

@thallgren Thanks for the clarification. I am still wondering how to use Telepresence 2 to develop Kubernetes Controller for example :-)

olegsu avatar Aug 18 '21 14:08 olegsu

@olegsu I use tp2 to develop a controller, but I just use tp connect so I can resolve services in my cluster. I don't use tp intercept for controllers, only for apps that require http ingress. I just remove/don't deploy the controller to the cluster and boot the controller up locally and it all works since it just needs to connect to the k8s api and a few services I deployed to k8s to work locally

scottillogical avatar Aug 18 '21 14:08 scottillogical

@scottillogical thank you!! I will try this approach as well 👍

olegsu avatar Aug 18 '21 15:08 olegsu

My setup is more complex and telepresence 2 is not working well.

I have multiple microservices with celery containers (async tasks workers) All celery containers have 0 replicas and are configured to scale up based on incoming tasks (rabbitmq)

Telepresence deployment swap just works, it's replacing deployment in the cluster with my local container which is consuming the tasks from the queue.

With telepresence 2 we would need to manually delete HPA for the service developer is currently working on, and restore it afterwards in reliable manner, so the environment is always returning to the default state. If you multiply this by the number of separate development environments (each living in a dedicated namespace) you can see the maintenance overhead.

Is there any plan to address this issue any time soon?

lpakula avatar Aug 19 '21 07:08 lpakula

This is an issue for us too. Telepresence 2 solves problems we had with v1, but is a non-starter for some workloads because of this. I find myself having to switch between v1 and v2 which is certainly not ideal. Need some way to swap or disable the existing deployment entirely.

bpfoster avatar Dec 01 '21 18:12 bpfoster

This is a deal-breaker for us for Telepresence v2 unfortunately. We have a number of services that rely on job queue semantics, where an incoming request gets placed in a job queue later to be picked up by an available worker. This worked perfectly with Telepresence v1, which entirely swapped out the deployment for the local process. With v2, it's a fifty-fifty chance that a job gets handled by the (still-running) original pod inside the cluster, which makes the debugging value of telepresence very limited (in fact it almost adds confusion). So for the time being we're stuck with v1.

Edit: Wanted to add that we are very happy with telepresence (v1), kudos to the team! I just wanted to add our use-case to the ticket to (perhaps) give it an additional push in priority.

petergardfjall avatar Apr 05 '22 13:04 petergardfjall

I think this relates to #1608

petergardfjall avatar Apr 06 '22 05:04 petergardfjall

I am hitting this problem with a controller that has an admission webhook. The same process in the pod runs the controller and responds to the webhook. Using telepresence, I can intercept the webhook to run locally, but it leaves the controller running in the pod which prevents me from also running the controller locally.

If telepresence was able to just replace a container with its proxy, that would work for this use case.

mkjpryor avatar Feb 09 '23 13:02 mkjpryor