g10k
g10k copied to clipboard
G10k hanging at control_repo branch iteration
Occasionally g10k is hanging at control_repo branch iteration. There are 4 child processes, that are just sleeping, looking like a deadlock (checking by ps, pstree, lsof, iotop).
Plarform: Linux amd64 Version: github release 0.9.9
$ g10k --version g10k v0.9.9-1-gfc83c96 Build time: 2024-02-08_11:56:54 UTC
Config file g10k.yaml :
`--- :cachedir: "./g10k-cache" use_cache_fallback: true retry_git_commands: true ignore_unreachable_modules: false
git: default_ref: main
deploy: purge_levels: ["deployment", "environment", "puppetfile"] purge_allowlist: [ '/.gitkeep', '.latest_revision', '.resource_types', 'resource_types/*.pp', '/.pp' ] deployment_purge_allowlist: [ 'production_s' ] generate_types: false
sources: control_repo: &control_repo remote: "https://private_gitlab/control_repo.git" basedir: "./environments" filter_regex: "^(production|staging)$" force_forge_versions: false prefix: false exit_if_unreachable: true invalid_branches: "correct" warn_if_branch_is_missing: true `
Execute like:
g10k -verbose -debug -branch production -config ./g10k.yaml
Logs of problematic execution:
... 2025/01/27 14:02:03 Executing git --git-dir g10k-cache/environments/control_repo.git remote -v took 0.00079s ... 2025/01/27 14:02:04 Executing git --git-dir g10k-cache/environments/control_repo.git remote update --prune took 0.37934s 2025/01/27 14:02:04 DEBUG executeCommand(): Executing git --git-dir g10k-cache/environments/control_repo.git branch 2025/01/27 14:02:04 Executing git --git-dir g10k-cache/environments/control_repo.git branch took 0.00108s 2025/01/27 14:02:04 DEBUG Skipping branch main of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Environment staging of source control_repo does not match branch name filter 'production', skipping 2025/01/27 14:02:04 DEBUG Skipping branch test-branch1 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch test-branch2 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG 1(): Resolving environment production of source control_repo 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch3 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch4 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch5 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing-branch6 of source control_repo, because of filter_regex setting 2025/01/27 14:02:04 DEBUG Skipping branch testing_branch7 of source control_repo, because of filter_regex setting [ hang indefinitely ]
Logs of correct execution:
... 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git remote -v took 0.00100s ... 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git remote update --prune took 0.26773s 2025/01/29 08:56:52 DEBUG executeCommand(): Executing git --git-dir g10k-cache/environments/control_repo.git branch 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git branch took 0.00146s 2025/01/29 08:56:52 DEBUG Skipping branch main of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Environment staging of source control_repo does not match branch name filter 'production', skipping 2025/01/29 08:56:52 DEBUG Skipping branch test-branch1 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch test-branch2 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch3 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch4 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch5 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG 1(): Resolving environment production of source control_repo 2025/01/29 08:56:52 DEBUG Skipping branch testing-branch6 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG Skipping branch testing_branch7 of source control_repo, because of filter_regex setting 2025/01/29 08:56:52 DEBUG executeCommand(): Executing git --git-dir g10k-cache/environments/control_repo.git rev-parse --verify 'production^{object}' 2025/01/29 08:56:52 Executing git --git-dir g10k-cache/environments/control_repo.git rev-parse --verify 'production^{object}' took 0.00103s ...
So last printings (at deadlock case) are about the filtering of most branches ( https://github.com/xorpaul/g10k/blob/v0.9.9/puppetfile.go#L105 ) and for branch "production" the start of syncing https://github.com/xorpaul/g10k/blob/v0.9.9/puppetfile.go#L122 . At first look, it seems that the random order of branch iteration is causing the deadlock. The issue was reproducible for one system user, while not for another...
I tried with a custom build with more debug messages around sizedwaitgroup's calls and mutex locks/unlocks, against latest go version 1.23, but I couldn't reproduce yet.
Maybe sizedwaitgroups library's pr https://github.com/remeh/sizedwaitgroup/pull/9 is related to this...?
Hi! Luckily we met again the deadlock. (This time all of these branches were enabled, but almost all environments were already updated.) With SIGQUIT, we got below stack-traces :
` SIGQUIT: quit PC=0x403f2e m=0 sigcode=0
goroutine 0 [idle]: runtime/internal/syscall.Syscall6() /usr/lib/go-1.21/src/runtime/internal/syscall/asm_linux_amd64.s:36 +0xe fp=0x7ffd228d52e0 sp=0x7ffd228d52d8 pc=0x403f2e runtime/internal/syscall.EpollWait(0x0?, {0x7ffd228d53a8?, 0x0?, 0x0?}, 0x0?, 0x0?) /usr/lib/go-1.21/src/runtime/internal/syscall/syscall_linux.go:55 +0x45 fp=0x7ffd228d5330 sp=0x7ffd228d52e0 pc=0x403ea5 runtime.netpoll(0xc000026000?) /usr/lib/go-1.21/src/runtime/netpoll_epoll.go:118 +0xcb fp=0x7ffd228d59c0 sp=0x7ffd228d5330 pc=0x431aab runtime.findRunnable() /usr/lib/go-1.21/src/runtime/proc.go:3186 +0x825 fp=0x7ffd228d5ad0 sp=0x7ffd228d59c0 pc=0x43d825 runtime.schedule() /usr/lib/go-1.21/src/runtime/proc.go:3577 +0xb1 fp=0x7ffd228d5b08 sp=0x7ffd228d5ad0 pc=0x43e991 runtime.park_m(0xc000007ba0?) /usr/lib/go-1.21/src/runtime/proc.go:3740 +0x11f fp=0x7ffd228d5b50 sp=0x7ffd228d5b08 pc=0x43ee9f runtime.mcall() /usr/lib/go-1.21/src/runtime/asm_amd64.s:458 +0x4e fp=0x7ffd228d5b68 sp=0x7ffd228d5b50 pc=0x464fee
goroutine 1 [semacquire]: runtime.gopark(0xc00001c2a0?, 0xc00001c2a0?, 0x60?, 0x40?, 0xc00018c750?) /usr/lib/go-1.21/src/runtime/proc.go:398 +0xce fp=0xc000145450 sp=0xc000145430 pc=0x4385ce runtime.goparkunlock(...) /usr/lib/go-1.21/src/runtime/proc.go:404 runtime.semacquire1(0xc00005d6d8, 0x30?, 0x1, 0x0, 0xcb?) /usr/lib/go-1.21/src/runtime/sema.go:160 +0x218 fp=0xc0001454b8 sp=0xc000145450 pc=0x448f78 sync.runtime_Semacquire(0xc00018c780?) /usr/lib/go-1.21/src/runtime/sema.go:62 +0x25 fp=0xc0001454f0 sp=0xc0001454b8 pc=0x463625 sync.(*WaitGroup).Wait(0xc0001455a0?) /usr/lib/go-1.21/src/sync/waitgroup.go:116 +0x48 fp=0xc000145518 sp=0xc0001454f0 pc=0x471068 github.com/remeh/sizedwaitgroup.(*SizedWaitGroup).Wait(...) /home/andpaul/dev/go/src/github.com/xorpaul/g10k/vendor/github.com/remeh/sizedwaitgroup/sizedwaitgroup.go:83 main.resolvePuppetEnvironment(0x0, {0x0, 0x0}) /home/andpaul/dev/go/src/github.com/xorpaul/g10k/puppetfile.go:214 +0x37b fp=0xc000145698 sp=0xc000145518 pc=0x6ca63b main.main() /home/andpaul/dev/go/src/github.com/xorpaul/g10k/g10k.go:291 +0xb87 fp=0xc000145f40 sp=0xc000145698 pc=0x6c2507 runtime.main() /usr/lib/go-1.21/src/runtime/proc.go:267 +0x2bb fp=0xc000145fe0 sp=0xc000145f40 pc=0x43817b runtime.goexit() /usr/lib/go-1.21/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000145fe8 sp=0xc000145fe0 pc=0x466e61
goroutine 2 [force gc (idle)]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go-1.21/src/runtime/proc.go:398 +0xce fp=0xc00003afa8 sp=0xc00003af88 pc=0x4385ce runtime.goparkunlock(...) /usr/lib/go-1.21/src/runtime/proc.go:404 runtime.forcegchelper() /usr/lib/go-1.21/src/runtime/proc.go:322 +0xb3 fp=0xc00003afe0 sp=0xc00003afa8 pc=0x438453 runtime.goexit() /usr/lib/go-1.21/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00003afe8 sp=0xc00003afe0 pc=0x466e61 created by runtime.init.6 in goroutine 1 /usr/lib/go-1.21/src/runtime/proc.go:310 +0x1a
goroutine 3 [GC sweep wait]: runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?) /usr/lib/go-1.21/src/runtime/proc.go:398 +0xce fp=0xc00003b778 sp=0xc00003b758 pc=0x4385ce runtime.goparkunlock(...) /usr/lib/go-1.21/src/runtime/proc.go:404 runtime.bgsweep(0x0?) /usr/lib/go-1.21/src/runtime/mgcsweep.go:280 +0x94 fp=0xc00003b7c8 sp=0xc00003b778 pc=0x424754 runtime.gcenable.func1() /usr/lib/go-1.21/src/runtime/mgc.go:200 +0x25 fp=0xc00003b7e0 sp=0xc00003b7c8 pc=0x419b05 runtime.goexit() /usr/lib/go-1.21/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00003b7e8 sp=0xc00003b7e0 pc=0x466e61 created by runtime.gcenable in goroutine 1 /usr/lib/go-1.21/src/runtime/mgc.go:200 +0x66
goroutine 4 [GC scavenge wait]: runtime.gopark(0xc00005a000?, 0x7dec98?, 0x1?, 0x0?, 0xc000006d00?) /usr/lib/go-1.21/src/runtime/proc.go:398 +0xce fp=0xc00003bf70 sp=0xc00003bf50 pc=0x4385ce runtime.goparkunlock(...) /usr/lib/go-1.21/src/runtime/proc.go:404 runtime.(*scavengerState).park(0x9cb4a0) /usr/lib/go-1.21/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00003bfa0 sp=0xc00003bf70 pc=0x422029 runtime.bgscavenge(0x0?) /usr/lib/go-1.21/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00003bfc8 sp=0xc00003bfa0 pc=0x4225bc runtime.gcenable.func2() /usr/lib/go-1.21/src/runtime/mgc.go:201 +0x25 fp=0xc00003bfe0 sp=0xc00003bfc8 pc=0x419aa5 runtime.goexit() /usr/lib/go-1.21/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00003bfe8 sp=0xc00003bfe0 pc=0x466e61 created by runtime.gcenable in goroutine 1 /usr/lib/go-1.21/src/runtime/mgc.go:201 +0xa5
goroutine 5 [finalizer wait]: runtime.gopark(0x40c9fe?, 0x7f882d40af20?, 0x0?, 0x0?, 0x0?) /usr/lib/go-1.21/src/runtime/proc.go:398 +0xce fp=0xc00003a620 sp=0xc00003a600 pc=0x4385ce runtime.runfinq() /usr/lib/go-1.21/src/runtime/mfinal.go:193 +0x107 fp=0xc00003a7e0 sp=0xc00003a620 pc=0x418b27 runtime.goexit() /usr/lib/go-1.21/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00003a7e8 sp=0xc00003a7e0 pc=0x466e61 created by runtime.createfing in goroutine 1 /usr/lib/go-1.21/src/runtime/mfinal.go:163 +0x3d
goroutine 17 [sync.Mutex.Lock]: runtime.gopark(0x7f882d4083d0?, 0x7f882d3ff108?, 0x0?, 0x4c?, 0x7f882d409438?) /usr/lib/go-1.21/src/runtime/proc.go:398 +0xce fp=0xc000074930 sp=0xc000074910 pc=0x4385ce runtime.goparkunlock(...) /usr/lib/go-1.21/src/runtime/proc.go:404 runtime.semacquire1(0x9fa584, 0x0?, 0x3, 0x1, 0xea?) /usr/lib/go-1.21/src/runtime/sema.go:160 +0x218 fp=0xc000074998 sp=0xc000074930 pc=0x448f78 sync.runtime_SemacquireMutex(0x8e?, 0x3b?, 0x3b?) /usr/lib/go-1.21/src/runtime/sema.go:77 +0x25 fp=0xc0000749d0 sp=0xc000074998 pc=0x4636e5 sync.(*Mutex).lockSlow(0x9fa580) /usr/lib/go-1.21/src/sync/mutex.go:171 +0x15d fp=0xc000074a20 sp=0xc0000749d0 pc=0x46f9bd sync.(*Mutex).Lock(...) /usr/lib/go-1.21/src/sync/mutex.go:90 main.resolvePuppetEnvironment.func1.1({0xc00020c02c, 0x12}, {{0xc00001a0c0, 0x3b}, {0xc000012cc0, 0xe}, {0xc000012cf0, 0x5}, {0x0, 0x0}, ...}, ...) /home/andpaul/dev/go/src/github.com/xorpaul/g10k/puppetfile.go:152 +0x832 fp=0xc000074f28 sp=0xc000074a20 pc=0x6cc312 main.resolvePuppetEnvironment.func1.3() /home/andpaul/dev/go/src/github.com/xorpaul/g10k/puppetfile.go:197 +0x66 fp=0xc000074fe0 sp=0xc000074f28 pc=0x6cbaa6 runtime.goexit() /usr/lib/go-1.21/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc000074fe8 sp=0xc000074fe0 pc=0x466e61 created by main.resolvePuppetEnvironment.func1 in goroutine 6 /home/andpaul/dev/go/src/github.com/xorpaul/g10k/puppetfile.go:119 +0xddc
rax 0xfffffffffffffffc rbx 0x4 rcx 0x403f2e rdx 0x80 rdi 0x4 rsi 0x7ffd228d53a8 rbp 0x7ffd228d5320 rsp 0x7ffd228d52d8 r8 0x0 r9 0x0 r10 0xffffffffffffffff r11 0x246 r12 0x7ffd228d53b0 r13 0x7947942882882 r14 0x9cb880 r15 0x1f rip 0x403f2e rflags 0x246 cs 0x33 fs 0x0 gs 0x0 ERROR: Exiting. `
We've never encountered this before.
How many branches are you updating?
Did you change the default count of any workers? -maxworker or -maxextractworker?
But yes, given your stacktrace it could be caused by either the remeh/sizedwaitgroup or the mutex.Lock() in the puppetfile.go
I still need to investigate this further, but right now I can't even get my build to work anymore as it does not terminate even with the -version parameter π₯΄
$ export CGO_ENABLED=0 ; export BUILDTIME=$(date -u '+%Y-%m-%d_%H:%M:%S') ; export BUILDVERSION=$(git describe --tags) ; /usr/local/go/bin/go build -ldflags "-X main.buildtime=${BUILDTIME} -X main.buildversion=${BUILDVERSION}" && date && ./g10k -version
Fri Feb 7 05:27:07 PM CET 2025
g10k v0.9.9-9-g71ddd11 Build time: 2025-02-07_16:27:07 UTC
We're seeing hangs in Debian as well. I can't reproduce them in the Docker container (which is based on alpine and golang 1.17). Debian's package is built with golang 1.23, so I wonder if that might be the issue here. But I can't reproduce by patching that Dockerfile to switch from 1.17 to 1.23...
So there might be something wrong with our build toolchain, but i figured i would mention it here since others are having similar issues...
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1093015
and i just tried to reproduce in a debian container and I can't reproduce.
so there's something on my desktop, or environment or something that triggers this.
i'll also note that the package from debian bookworm (0.9.7) doesn't have that issue.
Hi! @xorpaul no change at max workers or similar setting. Just to be clear, the issue is not always reproducible. For example at same host, for one user there is virtually no problem, while for another it hangs almost always.
I build and test, after replacing sync.Mutex with delock.Mutex (build against golang 1.23). Hopefully I'll come back with more info.
I build and test, after replacing sync.Mutex with delock.Mutex (build against golang 1.23). Hopefully I'll come back with more info.
do you have an actual patch showing what you did there? i'm curious..
Hi @anarcat ! That's the changes: https://github.com/xorpaul/g10k/commit/783e104e884b71288a05324eaa8fd8c6349c3385 (note that I kept same code lines, so any debugging info applies to original code, as well)
Build by:
` go get
go mod vendor
export CGO_ENABLED=0
export BUILDTIME=$(date -u '+%Y-%m-%d_%H:%M:%S')
export BUILDVERSION=$(git describe --tags)
go build -ldflags "-X main.buildtime=${BUILDTIME} -X main.buildversion=${BUILDVERSION}"
./g10k -version `
i see this hang with that patch as well.
note that i don't even need to have a Puppetfile or any special environment: just running g10k -version in an empty temp directory triggers the issue.
what i find really puzzling is that i can't reproduce with the exact same package, in a clean Debian container. this means it's something with the glibc runtime or something, because it's not the environment either, env -i still triggers it.
what a weird bug.
here's my backtrace:
anarcat@angela:/t/tmp.9qpX0l7YLs> g10k -version
g10k 0.9.9 Build time: 1739301748 UTC
^\SIGQUIT: quit
PC=0x4748a1 m=0 sigcode=128
goroutine 0 gp=0xa59600 m=0 mp=0xa5a620 [idle]:
runtime.futex(0xa5a760, 0x80, 0x0, 0x0, 0x0, 0x0)
runtime/sys_linux_amd64.s:557 +0x21 fp=0x7fff6bf7a858 sp=0x7fff6bf7a850 pc=0x4748a1
runtime.futexsleep(0x7fff6bf7a8d0?, 0x40c030?, 0x47448d?)
runtime/os_linux.go:69 +0x30 fp=0x7fff6bf7a8a8 sp=0x7fff6bf7a858 pc=0x431290
runtime.notesleep(0xa5a760)
runtime/lock_futex.go:170 +0x87 fp=0x7fff6bf7a8e0 sp=0x7fff6bf7a8a8 pc=0x40c1c7
runtime.mPark(...)
runtime/proc.go:1866
runtime.stopm()
runtime/proc.go:2886 +0x8c fp=0x7fff6bf7a910 sp=0x7fff6bf7a8e0 pc=0x43c5cc
runtime.findRunnable()
runtime/proc.go:3623 +0xd5c fp=0x7fff6bf7aa88 sp=0x7fff6bf7a910 pc=0x43e03c
runtime.schedule()
runtime/proc.go:3996 +0xb1 fp=0x7fff6bf7aac0 sp=0x7fff6bf7aa88 pc=0x43f111
runtime.goschedImpl(0xc0000061c0, 0x80?)
runtime/proc.go:4137 +0x185 fp=0x7fff6bf7ab10 sp=0x7fff6bf7aac0 pc=0x43f6e5
runtime.gosched_m(0xc0000061c0?)
runtime/proc.go:4142 +0x15 fp=0x7fff6bf7ab30 sp=0x7fff6bf7ab10 pc=0x43f855
runtime.mcall()
runtime/asm_amd64.s:459 +0x4e fp=0x7fff6bf7ab48 sp=0x7fff6bf7ab30 pc=0x470c2e
goroutine 1 gp=0xc0000061c0 m=nil [runnable]:
runtime.Gosched()
runtime/proc.go:353 +0x14 fp=0xc0000dd628 sp=0xc0000dd610 pc=0x437d14
internal/runtime/exithook.Run(0x0)
internal/runtime/exithook/hooks.go:61 +0x36 fp=0xc0000dd688 sp=0xc0000dd628 pc=0x477076
runtime.runExitHooks(...)
runtime/proc.go:320
os.runtime_beforeExit(0xc0000a0410?)
runtime/proc.go:307 +0x14 fp=0xc0000dd6a0 sp=0xc0000dd688 pc=0x46af34
os.Exit(0x0)
os/proc.go:75 +0x2e fp=0xc0000dd6c0 sp=0xc0000dd6a0 pc=0x4bff4e
main.main()
github.com/xorpaul/g10k/g10k.go:259 +0x91e fp=0xc0000ddf50 sp=0xc0000dd6c0 pc=0x709d9e
runtime.main()
runtime/proc.go:272 +0x28b fp=0xc0000ddfe0 sp=0xc0000ddf50 pc=0x43794b
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000ddfe8 sp=0xc0000ddfe0 pc=0x472ae1
goroutine 2 gp=0xc000006700 m=nil [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000064fa8 sp=0xc000064f88 pc=0x46b02e
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.forcegchelper()
runtime/proc.go:337 +0xb3 fp=0xc000064fe0 sp=0xc000064fa8 pc=0x437c93
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000064fe8 sp=0xc000064fe0 pc=0x472ae1
created by runtime.init.7 in goroutine 1
runtime/proc.go:325 +0x1a
goroutine 3 gp=0xc000006c40 m=nil [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000065780 sp=0xc000065760 pc=0x46b02e
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.bgsweep(0xc00001e180)
runtime/mgcsweep.go:277 +0x94 fp=0xc0000657c8 sp=0xc000065780 pc=0x422734
runtime.gcenable.gowrap1()
runtime/mgc.go:204 +0x25 fp=0xc0000657e0 sp=0xc0000657c8 pc=0x417045
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000657e8 sp=0xc0000657e0 pc=0x472ae1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:204 +0x66
goroutine 4 gp=0xc000006e00 m=nil [GC scavenge wait]:
runtime.gopark(0xc00001e180?, 0x83f970?, 0x1?, 0x0?, 0xc000006e00?)
runtime/proc.go:424 +0xce fp=0xc000065f78 sp=0xc000065f58 pc=0x46b02e
runtime.goparkunlock(...)
runtime/proc.go:430
runtime.(*scavengerState).park(0xa59200)
runtime/mgcscavenge.go:425 +0x49 fp=0xc000065fa8 sp=0xc000065f78 pc=0x420169
runtime.bgscavenge(0xc00001e180)
runtime/mgcscavenge.go:653 +0x3c fp=0xc000065fc8 sp=0xc000065fa8 pc=0x4206dc
runtime.gcenable.gowrap2()
runtime/mgc.go:205 +0x25 fp=0xc000065fe0 sp=0xc000065fc8 pc=0x416fe5
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc000065fe8 sp=0xc000065fe0 pc=0x472ae1
created by runtime.gcenable in goroutine 1
runtime/mgc.go:205 +0xa5
goroutine 5 gp=0xc000007340 m=nil [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0xc0000646d8?)
runtime/proc.go:424 +0xce fp=0xc000064620 sp=0xc000064600 pc=0x46b02e
runtime.runfinq()
runtime/mfinal.go:193 +0x107 fp=0xc0000647e0 sp=0xc000064620 pc=0x4160c7
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000647e8 sp=0xc0000647e0 pc=0x472ae1
created by runtime.createfing in goroutine 1
runtime/mfinal.go:163 +0x3d
goroutine 6 gp=0xc0000076c0 m=nil [chan receive]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
runtime/proc.go:424 +0xce fp=0xc000066718 sp=0xc0000666f8 pc=0x46b02e
runtime.chanrecv(0xc000022380, 0x0, 0x1)
runtime/chan.go:639 +0x41c fp=0xc000066790 sp=0xc000066718 pc=0x406c9c
runtime.chanrecv1(0x0?, 0x0?)
runtime/chan.go:489 +0x12 fp=0xc0000667b8 sp=0xc000066790 pc=0x406852
runtime.unique_runtime_registerUniqueMapCleanup.func1(...)
runtime/mgc.go:1781
runtime.unique_runtime_registerUniqueMapCleanup.gowrap1()
runtime/mgc.go:1784 +0x2f fp=0xc0000667e0 sp=0xc0000667b8 pc=0x419e6f
runtime.goexit({})
runtime/asm_amd64.s:1700 +0x1 fp=0xc0000667e8 sp=0xc0000667e0 pc=0x472ae1
created by unique.runtime_registerUniqueMapCleanup in goroutine 1
runtime/mgc.go:1779 +0x96
rax 0xca
rbx 0x0
rcx 0x4748a3
rdx 0x0
rdi 0xa5a760
rsi 0x80
rbp 0x7fff6bf7a898
rsp 0x7fff6bf7a850
r8 0x0
r9 0x0
r10 0x0
r11 0x286
r12 0x7fff6bf7a8d0
r13 0x1
r14 0xa59600
r15 0xf
rip 0x4748a1
rflags 0x286
cs 0x33
fs 0x0
gs 0x0
what's puzzling there is that the backtrace has practically nothing inside g10k itself, it's all in stdlib. looking at main() there's nothing but flag parsing there... what could be possibly triggering this other than a fault inside the golang runtime itself?
Yeah, I'm also confused by this. I was able to compile and run it at the end of 2024, but now I can't.
Maybe downgrading the golang version will help?
well, it's interesting. the bug doesn't trigger if you remove everything up to the flag parser and all files but config.go g10k.go and helper.go.
i'll keep bisecting like this if i can figure out which source file is triggering the deadlock.
update: that's about as far as I can go... all other files are too coupled with each other for this to build at all.
for what it's worth, i can't reproduce this by building g10k with golang 1.19 on debian trixie. so it definitely sounds like a regression somewhere in golang, between 1.19 and 1.23.
i'd recommend hooking this up to CI here in a matrix that would test those different versions.
This looks like a different problem (but you never know...), like a waiting channel that blocks the cleanup of a top level structure. (Googling unique_runtime_registerUniqueMapCleanup and chanrecv, is not of a great help...)
I could not reproduce anarcat's issue building either with CGO_ENABLED=1 or CGO_ENABLED=0 (static binary). Tried:
- Ubuntu 20.04, go 1.23.5 (manual install)
- Debian 12, go 1.19.8 ( golang deb package 2:1.19~1 )
- Docker image of Debian Trixie, go 1.23.5 ( golang deb package 2:1.23~2 )
I also tried the github latest release's binary (go 1.21) inside trixie container, like:
` make clean cp /path/to/official/release/g10k ./
cat > Dockerfile.trixie <<'EOF' FROM debian:trixie-slim COPY ./g10k /usr/bin/ ENTRYPOINT [ "/usr/bin/g10k" ] CMD ["-version"] EOF
podman build -t g10k:test -f Dockerfile.trixie . podman run --rm g10k:test ` @anarcat please share more info of your environment (go env) and maybe output of strace ./g10k -version ps aux | grep [g]10k pstree -lpnghs $G10K_PID lsof -p $G10K_PID etc, etc
Is your issue 100% reproducible, or looks like a bit random ?
strace ./g10k -version
that thing is massive, mainly a lot of futex failures
ps aux | grep [g]10k
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
anarcat 255071 191 0.0 1749112 8432 pts/4 Sl+ 08:57 0:26 g10k -version
pstree -lpnghs $G10K_PID
anarcat@angela:~> pstree -lpnghs 255071 (main)
systemd(1,1)βββsystemd(3829,3829)βββsh(237676,237675)βββfoot(237677,237675)βββfish(237681,237681)βββg10k(255071,255071)ββ¬β{g10k}(255072,255071)
ββ{g10k}(255073,255071)
ββ{g10k}(255074,255071)
ββ{g10k}(255075,255071)
ββ{g10k}(255076,255071)
ββ{g10k}(255077,255071)
ββ{g10k}(255090,255071)
lsof -p $G10K_PID
anarcat@angela:~> lsof -p 255071 (main)
lsof: WARNING: can't stat() tracefs file system /sys/kernel/debug/tracing
Output information may be incomplete.
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
g10k 255071 anarcat cwd DIR 253,3 147456 26214652 /home/anarcat/dist
g10k 255071 anarcat rtd DIR 253,1 4096 2 /
g10k 255071 anarcat txt REG 253,1 6690752 266450 /usr/bin/g10k
g10k 255071 anarcat mem REG 253,1 2003408 310441 /usr/lib/x86_64-linux-gnu/libc.so.6
g10k 255071 anarcat mem REG 253,1 222968 307843 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2
g10k 255071 anarcat 0u CHR 136,4 0t0 7 /dev/pts/4
g10k 255071 anarcat 1u CHR 136,4 0t0 7 /dev/pts/4
g10k 255071 anarcat 2u CHR 136,4 0t0 7 /dev/pts/4
g10k 255071 anarcat 3w CHR 5,0 0t0 11 /dev/tty
g10k 255071 anarcat 4u a_inode 0,16 0 2072 [eventpoll:3,5,6]
g10k 255071 anarcat 5u a_inode 0,16 0 2072 [eventfd:104]
g10k 255071 anarcat 6w CHR 5,0 0t0 11 /dev/tty
etc, etc
what else?
Is your issue 100% reproducible, or looks like a bit random ?
it's 100% reproducible on this desktop. as i said earlier, i can't reproduce inside a debian container, so there's really something off with this issue.
it might not be the same as the issue described here. i just hooked up here because it looked similar.
oh, and here's a "normal" strace run in a debian unstable container, where the issue is not reproduced.
oh wow.
it's a terminal issue.
if i run g10k -version inside gnome-terminal, the bug doesn't trigger. but if i run g10k inside foot or alacritty, then the process hangs.
i got the hint when i found this in the strace:
254913 openat(AT_FDCWD, "/dev/tty", O_WRONLY|O_CLOEXEC <unfinished ...>
that shoulid normally be:
[pid 3493] openat(AT_FDCWD, "/dev/tty", O_WRONLY|O_CLOEXEC) = 6
this is presumably why it can't be reproduced inside podman, because it messes with the terminal.
Hi! Thanks for the straces, I'll take a look. I think the slightly different runtime environment (even different terminals) affects the race conditions and the appearance of the bug. I added to my fork few more commits, embedding also pprof, what is supposed to show something, when g10k is hanging, by web browsing to http://localhost:6060. You can even try the build into a debian sid container: https://github.com/nickTh-tyn/g10k/blob/master/Dockerfile.sid-build
My last suspicion was on the promoted mutex of type LatestForgeModules ( https://github.com/xorpaul/g10k/blob/v0.9.9/g10k.go#L67 ), so I make it like typical struct var ( https://github.com/xorpaul/g10k/commit/4f36baa756f40551b8ce22473a8b4ee04e9bfd34 )...
@nickTh-tyn did you see the part where i can't reproduce inside gnome-terminal but i can under foot and alacritty? have you tried reproducing that?
@anarcat you are right that's the terminal. It was hanging also at my workstation, when running under xterm (gnome-shell, terminator had no issue). High cpu usage and pprof shows that time is spend in lock/unlock mostly.
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=2 ... (pprof) top10 Showing nodes accounting for 1940ms, 75.19% of 2580ms total Dropped 10 nodes (cum <= 12.90ms) Showing top 10 nodes out of 46 flat flat% sum% cum cum% 360ms 13.95% 13.95% 430ms 16.67% runtime.lock2 340ms 13.18% 27.13% 480ms 18.60% runtime.casgstatus 290ms 11.24% 38.37% 350ms 13.57% runtime.unlock2 220ms 8.53% 46.90% 220ms 8.53% runtime.nanotime (inline) 180ms 6.98% 53.88% 180ms 6.98% runtime.futex 170ms 6.59% 60.47% 170ms 6.59% internal/runtime/atomic.(*Int32).CompareAndSwap 120ms 4.65% 65.12% 1530ms 59.30% runtime.schedule 110ms 4.26% 69.38% 120ms 4.65% runtime.globrunqget 90ms 3.49% 72.87% 980ms 37.98% runtime.findRunnable 60ms 2.33% 75.19% 60ms 2.33% runtime.(*mLockProfile).recordUnlock
But github release (go 1.21) has no issue (at least at my workstation).
that is the weirdest bug this month.
For me, under xterm, the /dev/tty openat syscalls are successful and I see only futex failures:
... futex(0xc00004e848, FUTEX_WAKE_PRIVATE, 1) = 1 fcntl(1, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) fcntl(2, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) openat(AT_FDCWD, "/dev/tty", O_WRONLY|O_CLOEXEC) = 3 fcntl(3, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) fcntl(3, F_SETFL, O_WRONLY|O_NONBLOCK|O_LARGEFILE) = 0 epoll_create1(EPOLL_CLOEXEC) = 4 eventfd2(0, EFD_CLOEXEC|EFD_NONBLOCK) = 5 epoll_ctl(4, EPOLL_CTL_ADD, 5, {EPOLLIN, {u32=10983520, u64=10983520}}) = 0 epoll_ctl(4, EPOLL_CTL_ADD, 3, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3456106497, u64=9221515660956270593}}) = 0 fcntl(3, F_GETFL) = 0x8801 (flags O_WRONLY|O_NONBLOCK|O_LARGEFILE) fcntl(3, F_SETFL, O_WRONLY|O_LARGEFILE) = 0 ioctl(3, TIOCGWINSZ, {ws_row=64, ws_col=245, ws_xpixel=1474, ws_ypixel=836}) = 0 openat(AT_FDCWD, "/dev/tty", O_WRONLY|O_CLOEXEC) = 6 fcntl(6, F_GETFL) = 0x8001 (flags O_WRONLY|O_LARGEFILE) fcntl(6, F_SETFL, O_WRONLY|O_NONBLOCK|O_LARGEFILE) = 0 epoll_ctl(4, EPOLL_CTL_ADD, 6, {EPOLLIN|EPOLLOUT|EPOLLRDHUP|EPOLLET, {u32=3437756417, u64=9221515660937920513}}) = 0 fcntl(6, F_GETFL) = 0x8801 (flags O_WRONLY|O_NONBLOCK|O_LARGEFILE) fcntl(6, F_SETFL, O_WRONLY|O_LARGEFILE) = 0 ioctl(6, TIOCGWINSZ, {ws_row=64, ws_col=245, ws_xpixel=1474, ws_ypixel=836}) = 0 ioctl(1, TCGETS, 0xc0000a4dac) = -1 ENOTTY (Inappropriate ioctl for device) futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 write(1, "g10k v0.9.9 Build time: 2025-0"..., 50g10k v0.9.9 Build time: 2025-02-12_15:37:29 UTC ) = 50 futex(0xc00004e848, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xc000100148, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xc00004e848, FUTEX_WAKE_PRIVATE, 1) = 1 futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = 0 futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) futex(0xa59a60, FUTEX_WAIT_PRIVATE, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable) ...
BTW I tried to refresh all libraries (go list -mod=mod -u -m all; go get -u; go mod vendor; go clean -cache; make clean; ...), as I was suspecting outdated dependencies, like golang.org/x/term or github.com/mattn/go-isatty, but the hanging issue, at app's exit, under xterm, was not fixed...
And I confirm that, even with updated dependencies, bookworm static binary works fine (go 1.19.8), trixie/sid (go 1.23.6) is problematic (I pushed sample Dockerfiles for building), in relation to exit issue.
@xorpaul @anarcat I would propose to update all dependencies to latest versions, tag and post a new github release (with go 1.19 - 1.21, which was tested) and in parallel open a new issue about the hanging at exit (g10k -version) for visibility. So it would be easier to link to possibly new issues, with new go versions, at dependencies themselves.
I can now finally compile and execute the binary again without any problems, no matter the terminal I use.
For me everything was fixed with the latest 1.24 golang release, which was only release two days ago 2025-02-11 https://go.dev/doc/devel/release
$ /tmp/go/bin/go version
go version go1.23.6 linux/amd64
$ export CGO_ENABLED=0 ; /tmp/go/bin/go build -ldflags "-X main.buildtime=${BUILDTIME} -X main.buildversion=${BUILDVERSION}" && date && ./g10k --version
Thu Feb 13 01:52:43 PM CET 2025
g10k Build time: UTC
^C hangs indefinitely
$ go version
go version go1.24.0 linux/amd64
$ export CGO_ENABLED=0 ; go build -ldflags "-X main.buildtime=${BUILDTIME} -X main.buildversion=${BUILDVERSION}" && date && ./g10k --version
Thu Feb 13 01:52:57 PM CET 2025
g10k Build time: UTC
$ returns immediately as it should
Can you two also try it with the newest golang version?
Hi! It works with go 1.24.0.
Ok, I've fixed all the github CI tests and updated the vendor/modules/dependencies and build a new release using go 1.24.0 https://github.com/xorpaul/g10k/releases/tag/v0.9.10
@nickTh-tyn Hopefully this should also help with you stuck g10k runs
that is great news, but i'm sorry to announce i can't reproduce the fix in debian: building against 1.24 doesn't fix the problem, so perhaps there's some other dependency i need to update here as well.
sigh.
Thanks a lot @xorpaul ! There are good chances that's the same issue, as my colleague, that was facing the problem, had switched to konsole recently... We'll start using your latest release to check.
@anarcat Did you try to build it against the current master branch? I've also updates all dependecies/modules.