cli cf ssh - container does not have app's environment (same issue 3 years later)

cf ssh - container does not have app's environment (same issue 3 years later)

Open drnic opened this issue 5 years ago • 19 comments

When I run cf ssh <app> the terminal session does not have the same environment that the application process has - my $PATH etc is not setup for the interpreter provided by the buildpack, etc.

I must always remember to run /tmp/lifecycle/launcher "app" "$SHELL" "" immediately after cf ssh.

When I raised this in Feb 2016 (3 years ago) in cloudfoundry/cloud_controller_ng#754 the answer was something about Windows support.

Can we please fix this now so that cf ssh always sets up the environment to match the application process from the buildpack?

Apr 22 '19 22:04 drnic

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/165518892

The labels on this github issue will be updated when the story is started.

Apr 22 '19 22:04 cf-gitbot

@tcdowney @ssisil @zrob do you folks know whether the issues described above were addressed in v3 cf ssh? (@emalm who responded to the initial github issue)

Apr 24 '19 17:04 abbyachau

@drnic can you create an issue on https://github.com/cloudfoundry/cloud_controller_ng/issues since this is primarily a CAPI/Diego/Buildpacks Team (via the rootfs and launcher) interaction? We'll be able to track it better there. @cwlbraa has some thoughts on what we might be able to do.

Edit: Never mind for this part, Abby moved it. ^^

Also going to tag @heyjcollins, @sunjayBhatia, @shanks3012 for awareness and Diego/Buildpacks context.

Apr 24 '19 17:04 tcdowney

I feel like this might be a rootfs change to add a login .profile for vcap that does the equivalent of https://github.com/cloudfoundry/buildpackapplifecycle/blob/master/launcher/launcher_unix.go

... the missing env vars, afaict, are the ones that buildpacks write into droplets in the files under /home/vcap/.profile.d/*... it's not necessarily straightforward for CAPI to pull those out of the droplet after staging, though that's another possible way of getting at them.

Apr 24 '19 18:04 cwlbraa

Thank you @tcdowney, appreciate the help. Moving this over to the CAPI github.

Apr 24 '19 19:04 abbyachau

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/165577320

The labels on this github issue will be updated when the story is started.

Apr 24 '19 19:04 cf-gitbot

@drnic FYI, the CF docs at https://docs.cloudfoundry.org/devguide/deploy-apps/ssh-apps.html#ssh-env refer to a /tmp/lifecycle/shell helper binary that can replace the /tmp/lifecycle/launcher "app" "$SHELL" "" command.

I recall that after we encountered the port-mapping issues on Windows 2012R2 containers (cf-dev context at https://lists.cloudfoundry.org/g/cf-dev/topic/6332994), we also realized that there's a theoretical problem with running the launcher command more than once: since it always invokes the .profile.d shell scripts, and since those scripts are arbitrary buildpack-provided content, there's no guarantee that they're idempotent or re-entrant. Ideally those scripts are purely local, in that they're modifying only the environment of the process that sources them, but they could be modifying the mutable contents of the app in arbitrary ways (such as moving files or modifying their contents). For example, the Staticfile buildpack mvs and then erbs the provided nginx.conf file in https://github.com/cloudfoundry/staticfile-buildpack/blob/da1ba710b62633a9a922037e0bd63536203cf413/src/staticfile/finalize/data.go#L22-L23, and while I'd guess that this setup is idempotent I think it may not be re-entrant without a mutex around those two lines.

Apr 25 '19 00:04 emalm

Thanks for mentioning slightly nicer /tmp/lifecycle/shell.

I'll follow up on slack about the second paragraph. I'm not sure if you're writing good news or bad news.

Apr 25 '19 23:04 drnic

Hey, @drnic, that second paragraph is likely bad news for previous ways we thought we could solve the environment duplication problem: they had involved running either the diego-sshd server or each one of the processes it spawns through the launcher binary that loads the profile.d scripts, but that's not always safe to do. Turns out we had some other discussion of this issue on https://github.com/cloudfoundry/diego-ssh/issues/29#issuecomment-276926719, too, but I don't think we've considered it much since then.

Apr 30 '19 00:04 emalm

@emalm Is the issue above only for windows? Can cf ssh have a diff code path for windows vs linux containers; and auto-run the /tmp/lifecycle/shell command only when cf ssh into linux?

Apr 30 '19 00:04 drnic

No, the issue is with the potential danger in invoking the buildpack-provided profile.d/*.sh scripts more than once, which is independent of the OS. There was a separate issue on Windows 2012R2 that we were able to resolve.

Apr 30 '19 05:04 emalm

Where is /tmp/lifecycle/shell implemented? Could we have a slightly different one, say /tmp/lifecycle/ssh-shell that is invoked transparently during cf ssh that doesn't reload profile.d/*.sh scripts into the same env?

actually I'm still a bit confused why it is recommended for me to manually run /tmp/lifecycle/shell after cf ssh, but we can't have this done automatically for us. damn. sorry.

Apr 30 '19 07:04 drnic

It's in the buildpackapplifecycle, at https://github.com/cloudfoundry/buildpackapplifecycle/blob/master/shell/shell.go. But it doesn't sound like you want a version that skips running the profile.d/*.sh scripts, because in the case of your Ruby app those scripts set up the Ruby-specific env vars (BUNDLE_PATH, GEM_HOME, GEM_PATH, etc.) that you need for bundle and so forth.

The platform currently has no idea for a particular application whether it's safe to run the profile.d/*.sh scripts repeatedly in its instances, because they are completely opaque to it. Even if all the system-provided buildpacks provide safe scripts, some app on the platform may have used an externally hosted one whose scripts aren't safe to re-run. The app developer is able to reason about those scripts, though: they can either know that the buildpack they have used for the app provides re-runnable ones, or they can cf ssh into the container and inspect all of them to make sure before deciding whether to invoke them via /tmp/lifecycle/shell. (To be honest, I think https://docs.cloudfoundry.org/devguide/deploy-apps/ssh-apps.html#ssh-env deserves a warning that app devs should do this inspection.) So for the time being we've left it opt-in for the sake of safety. I agree that this experience could be better, but naively re-running those scripts in all cases (whether at the time of launching the diego-sshd server or at the invocation of each one-off cf ssh command) sounds like a recipe for subtle, non-deterministic bugs that actually affect app behavior (such as crashing on start up or corrupting configuration or app content).

Apr 30 '19 14:04 emalm

Also /cc @sclevine in case anything about Buildpacks v3 makes this environment-variable context easier to arrange.

Apr 30 '19 16:04 emalm

What if there was a way for app devs to opt into running the /tmp/lifecycle/shell script using a flag on cf ssh. Maybe something like cf ssh --environment=full (open to better naming here). This would probably be more discoverable/intuitive for CLI users than the section in the docs Eric linked.

As a proof of concept:

ssh -p 2222 -t cf:<app-guid>/0@ssh.<domain> "/tmp/lifecycle/shell"

appears to do the right thing (using cf ssh-code to get the password).

What do you think @drnic @emalm? Will this still be an issue in the 4k8s world?

Aug 17 '20 21:08 Gerg

I like the idea of a flag; albeit I'd prefer that the default was this full environment idea, and a flag for have a raw container.

I paid for Alfred so I could have clipboard shortcuts; and I paste in the magic line that enhances my ssh container. But it is hiding the pain and confusion and yes still needs some UX empathy and love.

Aug 17 '20 22:08 drnic

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/174631297

The labels on this github issue will be updated when the story is started.

Sep 02 '20 18:09 cf-gitbot

workarounds:

get envs from /proc/<PID>/environ
start a "sidecar" server in the entrypoint as port ":9999" which responds with the environment it has and then when executing the task or ssh run eval $(curl localhost:9999) before the actual command

Jun 08 '21 09:06 matti

The documented approach with /tmp/lifecycle/shell doesn't work for apps deployed with Docker images, because /tmp/lifecycle/shell is provided by the buildpacks. What about diego-sshd sourcing the env from /proc/<PID>/environ?

Jun 09 '21 09:06 beyhan

cli cli copied to clipboard

cf ssh - container does not have app's environment (same issue 3 years later)

cli
cli copied to clipboard