acr-cli icon indicating copy to clipboard operation
acr-cli copied to clipboard

SIGSEGV: segmentation violation on acr purge

Open nanoq66 opened this issue 1 year ago • 2 comments

Describe the bug A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Use a workload-identity to create a 24h valid token
  2. Login via acr login registryname.azurecr.io -u 00000000-0000-0000-0000-000000000000 --password-stdin < /acr/docker-token.txt
  3. Continuously call acr -r registryname purge --filter "${repo}:^(?i)(feature|renovate|task|fix|hotfix|update|local-SNAPSHOT|development|)" --ago 30d --untagged in a while loop for a few hours...

Expected behavior The purge command successfully continues to purge until all repos in the list it loops over are processed

Screenshots

Initially I just got the following error repeatedly, which got me into investigating the authentication part... introduced a token refresh and such kind of things... but no change Error: error resolving authentication: acr.BaseClient#GetAcrAccessToken: Failure responding to request: StatusCode=401 -- Original Error: autorest/azure: error response cannot be parsed: {"" '\x00' '\x00'} error: EOF

So in the end I took the plunge and added a retry mechanism to my script and added the -d debug switch on the retry. Then I came with a proper stack trace where it goes belly up... seems to be coming from the underlying azure-cli client being used.

acr login registryname.azurecr.io -u 00000000-0000-0000-0000-000000000000 --password-stdin < /acr/docker-token.txt -d
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7e2d11]

goroutine 1 [running]:
github.com/Azure/acr-cli/auth/oras.NewClient({{{0x7ffd16e92a6b, 0x24}, {0xc000294000, 0x42f}, {0x0, 0x0}, {0x0, 0x0}}, 0x0, 0x1})
	/go/src/github.com/Azure/acr-cli/auth/oras/client.go:32 +0x2b1
main.runLogin({{0x7ffd16e92a49, 0x1e}, {0x7ffd16e92a6b, 0x24}, {0xc000294000, 0x42f}, {0x0, 0x0, 0x0}, 0x1, ...})
	/go/src/github.com/Azure/acr-cli/cmd/acr/login.go:111 +0x3f8
main.newLoginCmd.func1(0xc000184600?, {0xc000156aa0?, 0x4?, 0x975bce?})
	/go/src/github.com/Azure/acr-cli/cmd/acr/login.go:56 +0x5d
github.com/spf13/cobra.(*Command).execute(0xc0001d6f08, {0xc000156a50, 0x5, 0x5})
	/go/src/github.com/Azure/acr-cli/vendor/github.com/spf13/cobra/command.go:985 +0xaca
github.com/spf13/cobra.(*Command).ExecuteC(0xc0001d6608)
	/go/src/github.com/Azure/acr-cli/vendor/github.com/spf13/cobra/command.go:1117 +0x3ff
github.com/spf13/cobra.(*Command).Execute(...)
	/go/src/github.com/Azure/acr-cli/vendor/github.com/spf13/cobra/command.go:1041
main.main()
	/go/src/github.com/Azure/acr-cli/cmd/acr/main.go:12 +0x4a

Any relevant environment information

  • OS: mcr.microsoft.com/acr/acr-cli Docker image
  • Version 0.13 / 0.14

Additional context The error happens sporadically... sometimes after 2h sometimes after 6h. We're looping over an ACR with ~4TB of data.

nanoq66 avatar Nov 07 '24 08:11 nanoq66

just saw there is a new 0.14 tag with some updated dependencies... currently started a run with it... lets see if it changes the pattern

nanoq66 avatar Nov 07 '24 09:11 nanoq66

no change though :(

nanoq66 avatar Nov 08 '24 08:11 nanoq66