artillery icon indicating copy to clipboard operation
artillery copied to clipboard

Fargate tasks not starting with "ResourceInitializationError: unable to pull secrets or registry auth"

Open NoelLH opened this issue 9 months ago • 12 comments

I'm trying to use the new Fargate approach in eu-central-1. (The same test repo has been used with Artillery Pro in eu-west-2 before.)

I've confirmed I have a VPC in the region, 3 public subnets, and that Artillery is correctly automatically using those subnets, so I don't think networking per se is the problem.

I've set up a user with the permissions as documented today. I created what's in the docs as a policy and attached this directly to a user group which my user is in – I wasn't totally clear how a role should be used if created.

It seems the ecr:GetAuthorizationToken is part of a policy that Artillery itself sets up via a worker role, and that's why it isn't in the documented policy to be set up manually in AWS. But I'm not sure what to try now to get it to run.

Version info:

Artillery: 2.0.0-37
Node.js:   v18.18.0
OS:        darwin

Running this command:

artillery run-fargate --count 1 --region eu-central-1 --overrides '{\"config\": {\"phases\": [{\"duration\": 1, \"arrivalRate\": 1}]}}' --output reports/report.json --record api-donations.yaml

I expected to see this happen:

A test run on Fargate

Instead, this happened:

Test stopped with:

Launching workers... [14:23:04]
Waiting for Fargate... [14:23:05]
Waiting for workers to start: deprovisioning: 1 [14:23:37]
[
  {
    attachments: [ [Object] ],
    attributes: [ [Object] ],
    availabilityZone: 'eu-central-1a',
    clusterArn: 'arn:aws:ecs:eu-central-1:[AWS_ACCT_ID]:cluster/artilleryio-cluster',
    connectivity: 'CONNECTED',
    connectivityAt: 2023-10-07T13:23:09.090Z,
    containers: [ [Object] ],
    cpu: '4096',
    createdAt: 2023-10-07T13:23:05.779Z,
    desiredStatus: 'STOPPED',
    enableExecuteCommand: false,
    executionStoppedAt: 2023-10-07T13:23:15.947Z,
    group: 'family:artilleryio-loadgen-worker_fargate_artilleryio-cluster_8fa978b3a50ce517e081ee7c126a354204807b1b_155552',
    healthStatus: 'UNKNOWN',
    lastStatus: 'STOPPED',
    launchType: 'FARGATE',
    memory: '8192',
    overrides: {
      containerOverrides: [Array],
      inferenceAcceleratorOverrides: [],
      taskRoleArn: 'arn:aws:iam::[AWS_ACCT_ID]:role/artilleryio-ecs-worker-role'
    },
    platformVersion: '1.4.0',
    platformFamily: 'Linux',
    stopCode: 'TaskFailedToStart',
    stoppedAt: 2023-10-07T13:23:39.068Z,
    stoppedReason: 'ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve ecr registry auth: service call has been retried 1 time(s): AccessDeniedException: User: arn:aws:sts::[AWS_ACCT_ID]:assumed-role/artilleryio-ecs-worker-role/1c0acd7ed5f84dff888dd1811f2922ce is not authorized to perform: ecr:GetAuthorizationToken on resource: * because no identity-based policy allows the ecr:GetAuthorizationToken action status code: 400, request id: ed9fb8dd-d720-4607-986d-8790c14d35b9',
    stoppingAt: 2023-10-07T13:23:25.972Z,
    tags: [],
    taskArn: 'arn:aws:ecs:eu-central-1:[AWS_ACCT_ID]:task/artilleryio-cluster/1c0acd7ed5f84dff888dd1811f2922ce',
    taskDefinitionArn: 'arn:aws:ecs:eu-central-1:[AWS_ACCT_ID]:task-definition/artilleryio-loadgen-worker_fargate_artilleryio-cluster_8fa978b3a50ce517e081ee7c126a354204807b1b_155552:1',
    version: 4,
    ephemeralStorage: { sizeInGiB: 20 }
  }
]
Error: Worker init failure, aborting test
Error: Worker init failure, aborting test
    at waitForTasks2 ([project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:14:19311)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async [project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:13:873
Error: Worker init failure, aborting test
    at waitForTasks2 ([project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:14:19311)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async [project-dir]/node_modules/@artilleryio/platform-fargate/lib/commands/run-cluster.js:13:873
Cleaning up... [14:23:47]
⠼ Error: error sending test data to Artillery Cloud
Test report may be incomplete
Request ID: a8b4c10f-6219-49a3-b989-0a389ad947ff

NoelLH avatar Oct 07 '23 13:10 NoelLH

Thanks @NoelLH! Looking into it - that permission should be added automatically without you needing to do anything.

hassy avatar Oct 09 '23 10:10 hassy

Hi, we encountered the same issue today. We resolved it by manually removing all resources on AWS which forced artillery to recreate everything again, so it looks like some old setting got cached somewhere.

peldax avatar Oct 13 '23 09:10 peldax

thanks for chiming in @peldax! @NoelLH - could you try one of:

  1. running the test in a different AWS account, or
  2. removing the old Artillery Pro CloudFormation stack, and then trying again

Everything is working as expected on my end, I've not been able to reproduce the issue.

hassy avatar Oct 20 '23 11:10 hassy

Thanks both!

I'm tight for time at the moment so trying to avoid setting up a distinct AWS account for this if possible @hassy.

I first removed all CloudFormation stacks I could find in all relevant regions & waited for the resource deletions (there was stuff from Artillery Pro and also old Serverless Artillery experiments), but this seemed to make no difference.

I then delete the IAM role "artilleryio-ecs-worker-role" which had no permissions attached, and that changed the AccessDenied detail to:

    authorized to perform: iam:CreatePolicy on resource: policy 
    artilleryio-ecs-worker-policy because no identity-based policy allows the 
    iam:CreatePolicy action

Each time, it seems to create the worker role again OK but not any permissions/policies for it.

NoelLH avatar Oct 20 '23 14:10 NoelLH

I think I've sorted this for our account.

I believe the problems were a combination of the all-or-nothing approach to the worker role creation, and 2 errors in the Artillery docs for Fargate which meant some of the required permissions weren't there when enough of the IAM resources were repeatedly deleted for Artillery to attempt their recreation:

  1. arn:aws:iam::123456789000:policy/ecs-worker-policy should be arn:aws:iam::123456789000:policy/artilleryio-ecs-worker-policy
  2. iam:AttachRolePolicy is required for resource arn:aws:iam::123456789000:role/artilleryio-ecs-worker-role, not [just] for the policy

NoelLH avatar Oct 26 '23 11:10 NoelLH

I am unable to use fargate now with the new task definitions that have parameter store secrets. I was able to run in fargate a few months ago. This is what I am getting as a reason for task stopping.

ResourceInitializationError: unable to pull secrets or registry auth: execution resource retrieval failed: unable to retrieve secrets from ssm: service call has been retried 1 time(s): invalid ssm parameters: /artilleryio/ARTIFACTORY_AUTH,/artilleryio/ARTIFACTORY_EMAIL,/artilleryio/NPMRC,/artilleryio/NPM_REGISTRY,/artilleryio/NPM_SCOPE,/artilleryio/NPM_SCOPE_REGISTRY,/artilleryio/NPM_TOKEN'

zeeshanpolaris avatar Dec 05 '23 15:12 zeeshanpolaris

Attempting use of Artillery for the first time in an AWS account, and experiencing the same thing as @zeeshanpolaris above.

Looks like there is a function ensureParameterExists that is likely intended to do this conditional parameter creation. But, I see no code references invoking it.

Perhaps this was missed in testing of the migration of the fargate support code in https://github.com/artilleryio/artillery/pull/2297 ? ( parameters already existing in test environment? )

RobMullen avatar Dec 05 '23 16:12 RobMullen

@RobMullen @zeeshanpolaris apologies, fix incoming

hassy avatar Dec 05 '23 16:12 hassy

@RobMullen @zeeshanpolaris apologies, fix incoming

Thank you. Appreciate it.

zeeshanpolaris avatar Dec 05 '23 16:12 zeeshanpolaris

Thanks again for reporting the issue @zeeshanpolaris @RobMullen

Fix is in this PR: https://github.com/artilleryio/artillery/pull/2354

A canary version of Artillery will be published once we merge to main which you can try to check if running a test works. (You can install the canary with npm install -g artillery@canary) Will also publish v2.0.3 later today.

hassy avatar Dec 05 '23 17:12 hassy

Thanks. I added those default values manually and got it working. However, I had to add these two additional permissions for cloudwatch logs in the policy used by the role.

{ "Effect": "Allow", "Action": [ "logs:CreateLogStream", "logs:CreateLogGroup" ], "Resource": [ "arn:aws:logs:RegionHiddenForSecurity:AcountNumberHiddenForSecurity:log-group:artilleryio-log-group/*" ] }

zeeshanpolaris avatar Dec 05 '23 18:12 zeeshanpolaris

Thank you very much, @hassy , for jumping on this quickly!! I too have worked around this via manual creation of the default parameter store entries. Will remove the parameter store entries and try out the canary out when it becomes available.

RobMullen avatar Dec 06 '23 01:12 RobMullen