workers-sdk icon indicating copy to clipboard operation
workers-sdk copied to clipboard

🐛 BUG: Error deleting worker `workers.api.error.unknown [code: 10013]`

Open ntotten opened this issue 1 year ago • 2 comments

Which Cloudflare product(s) does this pertain to?

Wrangler core

What version(s) of the tool(s) are you using?

[email protected]

What version of Node are you using?

18.19.0

What operating system and version are you using?

Github Actions, Ubuntu 22.04.03 LTS

Describe the Bug

Observed behavior

When attempting to delete workers and error is frequently received. As part of our CI process on github we create a number of workers in parallel (github action matrix). The workers are created, the tests run, then the workers are supposed to be deleted. We frequently see the deletes fail with the error workers.api.error.unknown [code: 10013]

Similar bugs have been reported on deploy, etc. https://github.com/cloudflare/workers-sdk/issues/4724

Expected behavior

The worker is deleted without error.

Steps to reproduce

  1. Create worker with wrangler deploy
  2. Run some requests against the worker 30 seconds or so
  3. Run wrangler delete dist/worker.js --name THE_WORKER_NAME

This is reproducible when running wrangler directly or using the github action.

Please provide a link to a minimal reproduction

No response

Please provide any relevant error logs

2024-02-02T13:06:19.9549620Z ##[group]Run cloudflare/wrangler-action@v3
2024-02-02T13:06:19.9550162Z with:
2024-02-02T13:06:19.9550809Z   apiToken: ***
2024-02-02T13:06:19.9551226Z   accountId: ***
2024-02-02T13:06:19.9551827Z   workingDirectory: test-fixtures/basic-open-api
2024-02-02T13:06:19.9552569Z   command: delete dist/worker.js --name basic-open-api-ci-7756361575-2024-01-15
2024-02-02T13:06:19.9553208Z   quiet: false
2024-02-02T13:06:19.9553608Z env:
2024-02-02T13:06:19.9554018Z   PROJECT_STORAGE_LOCATION: /tmp/****-projects
2024-02-02T13:06:19.9554567Z   TEST_STORAGE_LOCATION: /tmp/test-projects
2024-02-02T13:06:19.9555239Z   NODE_AUTH_TOKEN: ***
2024-02-02T13:06:19.9555665Z   AUTH0_DOMAIN: ***
2024-02-02T13:06:19.9556295Z   NPM_CONFIG_USERCONFIG: /home/runner/work/_temp/.npmrc
2024-02-02T13:06:19.9556835Z   CF_COMPATIBILITY_DATE: 2024-01-15
2024-02-02T13:06:19.9557340Z   WORKER_NAME: basic-open-api-ci-7756361575-2024-01-15
2024-02-02T13:06:19.9558229Z   API_HOST: basic-open-api-ci-7756361575-2024-01-15.test-basic.****test.com
2024-02-02T13:06:19.9558924Z ##[endgroup]
2024-02-02T13:06:20.0318457Z ##[group]📥 Installing Wrangler
2024-02-02T13:06:20.0356759Z [command]/opt/hostedtoolcache/node/18.19.0/x64/bin/npm i [email protected]
2024-02-02T13:06:21.6530406Z 
2024-02-02T13:06:21.6531972Z up to date, audited 576 packages in 1s
2024-02-02T13:06:21.6535529Z 
2024-02-02T13:06:21.6535968Z 61 packages are looking for funding
2024-02-02T13:06:21.6536763Z   run `npm fund` for details
2024-02-02T13:06:21.6919412Z 
2024-02-02T13:06:21.6920138Z 2 vulnerabilities (1 high, 1 critical)
2024-02-02T13:06:21.6920777Z 
2024-02-02T13:06:21.6921139Z To address all issues, run:
2024-02-02T13:06:21.6922139Z   npm audit fix
2024-02-02T13:06:21.6922584Z 
2024-02-02T13:06:21.6923210Z Run `npm audit` for details.
2024-02-02T13:06:21.7060363Z ✅ Wrangler installed
2024-02-02T13:06:21.7061750Z ##[endgroup]
2024-02-02T13:06:21.7064363Z ##[group]🚀 Running Wrangler Commands
2024-02-02T13:06:21.7078364Z [command]/opt/hostedtoolcache/node/18.19.0/x64/bin/npx wrangler delete dist/worker.js --name basic-open-api-ci-7756361575-2024-01-15
2024-02-02T13:06:22.6057941Z  ⛅️ wrangler 3.13.2 (update available 3.26.0)
2024-02-02T13:06:22.6059429Z ---------------------------------------------
2024-02-02T13:06:22.6138395Z ? Are you sure you want to delete basic-open-api-ci-7756361575-2024-01-15? This action cannot be undone.
2024-02-02T13:06:22.6140375Z 🤖 Using default value in non-interactive context: yes
2024-02-02T13:07:10.3072396Z 
2024-02-02T13:07:10.3597082Z [31m✘ [41;31m[[41;97mERROR[41;31m][0m [1mA request to the Cloudflare API (/accounts/41b5265674e8a276f1fa00f2c9884e59/workers/services/basic-open-api-ci-7756361575-2024-01-15) failed.[0m
2024-02-02T13:07:10.3598908Z 
2024-02-02T13:07:10.3599313Z   workers.api.error.unknown [code: 10013]
2024-02-02T13:07:10.3600079Z   
2024-02-02T13:07:10.3601728Z   If you think this is a bug, please open an issue at: [4mhttps://github.com/cloudflare/workers-sdk/issues/new/choose[0m
2024-02-02T13:07:10.3603101Z 
2024-02-02T13:07:10.3603112Z 
2024-02-02T13:07:10.3825934Z ##[endgroup]
2024-02-02T13:07:10.3857824Z ##[error]The process '/opt/hostedtoolcache/node/18.19.0/x64/bin/npx' failed with exit code 1
2024-02-02T13:07:10.3869113Z ##[error]🚨 Action failed
2024-02-02T13:07:10.4144294Z Post job cleanup.
2024-02-02T13:07:10.4204402Z Post job cleanup.
2024-02-02T13:07:10.4930147Z [command]/usr/bin/git version
2024-02-02T13:07:10.4969347Z git version 2.43.0
2024-02-02T13:07:10.5010198Z Temporarily overriding HOME='/home/runner/work/_temp/8058c340-33e8-4941-a7b3-8e06894597dc' before making global git config changes
2024-02-02T13:07:10.5012053Z Adding repository directory to the temporary git global config as a safe directory
2024-02-02T13:07:10.5015526Z [command]/usr/bin/git config --global --add safe.directory /home/runner/work/core/core
2024-02-02T13:07:10.5049910Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2024-02-02T13:07:10.5080045Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :"
2024-02-02T13:07:10.5328063Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2024-02-02T13:07:10.5347416Z http.https://github.com/.extraheader
2024-02-02T13:07:10.5358055Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader
2024-02-02T13:07:10.5385974Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :"
2024-02-02T13:07:10.5850915Z Cleaning up orphan processes

ntotten avatar Feb 02 '24 13:02 ntotten

Thanks for reporting this! Is this just an erroneous error, or are you also experiencing breaking behaviour? i.e. does the Worker actually get deleted?

penalosa avatar Feb 12 '24 16:02 penalosa

The worker does not get deleted, it happens frequently. As part of our test process we create and delete a lot of workers. The deletes have become very unreliable.

ntotten avatar Feb 22 '24 23:02 ntotten

We need to investigate whether we can determine what this internal error is. It might just be that the multiple delete attempts cause a deadlock or something in the backend DB?

petebacondarwin avatar Mar 11 '24 15:03 petebacondarwin

The same happened to our CD process today when it ran wrangler publish path/to/index.js --config path/to/wrangler.toml we run the same command for a long time and it never failed like that. Now our CD process became unreliable.

We're using Wrangler version 2.20.0.

kaitoqueiroz avatar Mar 13 '24 10:03 kaitoqueiroz

I have escalated this to our internal API team to try to diagnose why you are seeing these errors. I will update on status by the end of this week.

petebacondarwin avatar Mar 13 '24 11:03 petebacondarwin

The problem appears to be that since we must lock tables related to the script being deleted but also any zone associated with the worker (including the workers.dev subdomain) trying to delete multiple Workers in an account very close to each other can result in a deadlock or just a timeout waiting for the previous lock to release.

Can I check whether you are actually serializing these requests to delete Workers or are they likely to be running in parallel? I think if you can ensure that each delete request has completed before you try to run a new delete it might lower the occurrences of this.

petebacondarwin avatar Mar 13 '24 17:03 petebacondarwin

~~I have the same issue (workers.api.error.unknown [code: 10013]) when running wrangler tail or streaming logs through the Dash UI. However, deploying works fine. Looks like it's related to https://github.com/cloudflare/workers-sdk/issues/5402 and https://www.cloudflarestatus.com/incidents/b31dxdz60366 in my case.~~

Works fine now

megahertz avatar Mar 27 '24 01:03 megahertz

Ideally one should try not to delete lots of Workers in parallel, since it can trigger time-outs due to each parallel request locking the DB. I'm going to close this one for now as we haven't had any more reports in the last couple of weeks. If you see this again please comment here or create a new issue with further details.

petebacondarwin avatar Apr 15 '24 14:04 petebacondarwin