Working on AWS Workspaces with AD: k9s fails on startup with unknown userid error
This is a followup on #1842, so the base info from there is still valid and I will not repeat it.
Our team uses k9s to manage our clusters, which works well for quite some time on our dev machine (thanks to all contributors for this!)
We also face the issue now trying to use k9s on Amazon Workspaces (which is connected to an AD), so the user is not available in the /etc/passwd file. Adding it is not an option for us, unfortuanetly.
Does anybody know any other workaround? @evax I understand your comment correctly, that the user functions could be a fix that we could implement in k9s, right? I would like to see if I can take a stab at it and create a Merge Request, but I will need some guidance (I do have basic Go knowledge). The originating code seems to be located here.
We are facing this as well with directory based users without local /etc/passwd representation. Am interested in a work around or some sort of fix.
I'm not currently up on go, so will have to be a leach instead of trying to make a PR.
The function MustK9sUser seems to be used for specific use cases only:

...to allow multi-user logging, benchmarks etc.
The crash seems to be raised to due to ~the current-function for cgo-disabled go exeuctions, when trying to find the user's entry in /etc/passwd.~ Since cgo ~is~ seems to be disabled while building k9s, https://github.com/derailed/k9s/blob/15b33c02b2d49806552888e613c13d0e389620c0/.goreleaser.yml#L10 these functions are used.
A possible fix could be to just assume that the USER env variable is the user's correct name, when the the conventional lookup with https://github.com/derailed/k9s/blob/b5a7cfb3af43a0a79423683094bd33b1b9605faf/internal/config/helpers.go#L50 failed and the err is user.UnknownUserIdError. If the env variable is empty, k9s could just exit as it does right now.
I wasn't able to test workarounds due to missing testing possbilities today, but setting --logFile might not work since the default value (which is set during the command initialization) calls MustK9sUser: https://github.com/derailed/k9s/blob/b5a7cfb3af43a0a79423683094bd33b1b9605faf/cmd/root.go#L174-L179
Referring to my last comment - I just took a look at the code again and just saw this specific piece of code in Golang's source:

The error can not occur by calling user.Current when CGO_ENABLED is set to 0. The error would have been raised in the lookupUserId function (which raises if the user is not in the /etc/passwd file), but Go would then try to get the user information in other ways. If this does not work, then it would fail with another error.
I tried to reproduce the original error by adding some debug to a self-built version of k9s on a local AD (not ldap) domain controller server and client using the domain, but couldn't reproduce it. I explicitly built the k9s-binary using goreleaser (since I saw the .goreleaser.yml in project's root).
The binary producing this issue has been installed via brew (which should simply download the latest binary from GitHub releases). Unfortunately, I couldn't debug with delve the binary due to -s -w ldflags on build. So I tried gdb and found some interesting functions:
These functions can be found in this file:

When I built the binary on my own using goreleaser, I could not list any functions with gdb (also when building it manually with CGO_ENABLED=0 go build -ldflags="-s -w" -trimpath).
So I tried building the files manually without ldflags and analyse them with gdb, but built one binary with CGO_ENABLED=1:

and one with CGO_ENABLED=0:

Conclusion:
The binary uploaded to GitHub releases (and therefore used by brew etc.) seems to be built with Cgo enabled, which probably can't handle AD/LDAP users logged in on Linux systems due to some failing syscalls.
@derailed, do you still use goreleaser using the checked-in file when creating a new release? It really seems to me like CGO_ENABLED=0 is ignored when building the binaries (at least the linux-amd64 one). goreleaser build --single-target --snapshot --rm-dist (using goreleaser v1.14.1) created a working Cgo disabled one for me.
So another possible workaround could be to custom build the k9s-binary with Cgo explicitly disabled, but I couldn't prove this yet. In the meantime, I'll close my pull request, since it does not make sense when Cgo is disabled (because the error check for will never occur).
@StevenKGER First off, thank you so very much for this extensive research and identifying root cause! You Sir ROCK!! I'll take a closer look at the bin releases as I believe cgo is disabled??
In the meantime, I've quickly glanced at your PR and issue use cases and I think the best course of action is to introduce a K9S_USER env var. Unlike in your PR we should check it first. If it is set we should bypass the user.Current call.
The initial impetus for this impl was for folks using k9s on the same box (Yikes!!) and wanted sep k9s artifact dirs. Based on what I've seen it seems a better approach to set an env var so that k9s users don't have to deal with sec concerns and/or /etc/passwd for this very simple use case imho.
Would this make better sense??
You're welcome, I really like to support awesome work, @derailed! :)
I'll take a closer look at the bin releases as I believe cgo is disabled??
Yes, please check the build process binaries. I'll be happy if you share the result with us.
better approach to set an env var so that k9s users don't have to deal with sec concerns
Agree. But I'm not quite sure, if this would be some "overengineering" with variables users with this problem might not find. Sure, we need to document this variable in some way (maybe in an error message when the lookup fails), but system's USER env should also do it, right? It's the fallback mechanism by Golang itself, when Cgo is disabled, is preset by most (when not any) Linux systems and shells and shouldn't raise any security concerns as far as I know.
We probably don't need a change for this specific issue, if Cgo is set to disabled (which should be according to the goreleaser configuration), but it still might be a good enhancement, right?
But then, we might need to introduce another variable for user's home:
https://github.com/derailed/k9s/blob/b5a7cfb3af43a0a79423683094bd33b1b9605faf/internal/client/helpers.go#L78-L84
It's by default (implementation of os.UserHomeDir) getting fetched from $HOME env variable, which is set by login on Linux systems, when Cgo is disabled. So this seems to be a safe way as well.
So I would suggest just disabling Cgo in the build process (which won't require a change in the logic of MustK9sUser and therefore no PR), so that this issue is solved. After that, we can create a new issue for an enhancement by adding K9S_USER and K9S_HOME_DIR env variables. How do think about this idea?
@Morl99 I guess the pertinent Q here is how did you install k9s in your aws workspace?
@Morl99 I guess the pertinent Q here is how did you install k9s in your aws workspace?
Ah, I forgot: I could reproduce the bug on AWS Workspaces with k9s installed via brew. (brew install k9s) - That's how I found this issue because is was linked in #1842. Probably @Morl99 installed it in the same way.
@StevenKGER Interesting. Thank you! what about if you download the release binary directly from the repo? I've double checked and it seems the release is correctly compiled with cgo disabled. My suspicion is brew is building the binary for the targeted arch and hence cgo is enabled??
@Morl99 I guess the pertinent Q here is how did you install k9s in your aws workspace?
Ah, I forgot: I could reproduce the bug on AWS Workspaces with k9s installed via brew. (
brew install k9s) - That's how I found this issue because is was linked in #1842. Probably @Morl99 installed it in the same way.
yes, that is correct.
My suspicion is brew is building the binary for the targeted arch and hence cgo is enabled??
Without double checking: You are absolutely right!
The brew formula for k9s seems to build the binary without CGO_ENABLED=0:
...but only, when a binary was not found (as bottle):

I tried downloading/installing both on a WSL2 installation (x86_64), brew uses the given bottle - but it's different from the github releases binary one:
The brew one has the function lookupUnixUid (-> Cgo enabled) implemented, while the github releases binary one has not, according to gdb.
I don't know the brew build/bottle/... processes very well. (That's why I thought in my previous posts, that brew would simply download the releases, ugh.) Can you check, how the binaries are built as "bottle" and change the linked k9s.rb to build with CGO_ENABLED=0 as maintainer, @derailed? That should then work for both bottle users and users, which are required to build the binary on their own by brew's installation process.
@StevenKGER - Awesome!!
Thank you so much Steven for this excellent research!!
I hope many K9sers here will thank you personally for digging this out!
I am puzzled why we fail to pull the rel rev on linux? Which linux arch are you using for your experimentation intel, arm64 or other?? I think we might be missing a linux arch combo in the brew formula??
In any case, I think I've figured out the correct brew incantation but have 2 ways to approach it.
I am going to throw a new drop (aka plan A) over the fence and see if we're once again happy as a hippo.
If not we will go for plan B as sadly my brew fu is weak ;(
Hi @derailed,
thank you very much! Unfortunately, the bottles (and formula code) still seem to not (be) built/d with Cgo disabled.
On an AWS workspace Linux x86_64 I still get this message using brew install k9s:
The system is downloading this bottle from brew. If there wouldn't be any bottle available for this system, it still would build without CGO_ENABLED=0.
However, your own brew formula seems to be different. As far as I know, it's for your tap derailed/k9s/k9s, right? Installing this works for me on the AWS Workspaces! In case you can't fix the "standard" k9s package brew install k9s, you might want to change your README.md to point to the tap brew install derailed/k9s/k9s, as https://k9scli.io does.
@StevenKGER Excellent! Thank you!! Right when you use the bottle vs the tap on linux no match is found and triggers the cust install. Hence using the tap we match and use the correct linux bin with cgo disabled. Guessing amd64 vs x86_64 pb?? I have a PR on the brew repo to update the build and disable cgo. However the plot thicken as #780 requires it ;( Rats!!
I'll update the README to use the tap for this use case but if you need cgo enabled I think the best course of action for the moment would be to either use go install github.com/derailed/k9s@latest or build from source using the makefile ie turn off xcompiler and go native.
Thank you for the documentation changes, @derailed!
I have a PR on the brew repo to update the build and disable cgo. However the plot thicken as #780 requires it ;(
Well, in that case I'd just enable Cgo is requested in #780 and lookup the user with fallbacks as Golang does (and as I tried in the PR). I would guess, this would be the easiest way to fix both issues (since debugging network stuff is meh)
this helped:
brew install derailed/k9s/k9s
(already mentioned in the readme)
I've installed k9s v0.32.5 via devbox (uses nix under the hood) on GCE VM with os-login enabled (therefore, no entry in /etc/passwd), and I was able to work around this issue by setting a kube cache dir env var:
KUBECACHEDIR="$HOME/.kube/cache"