skyplane
skyplane copied to clipboard
[bug] `skyplane deprovision` doesn't handle a race condition when deprovisioning instance profiles
Describe the bug
When running skyplane deprovision, I get the following error:
skyplane deprovision ✘ 1
Deprovisioning 13 instances
⠋ Deprovisioning ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/13 0:00:0421:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-eaae092f_profile cannot be found.
⠹ Deprovisioning ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/13 0:00:0421:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-4b718763_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-88dcb8d9_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-fb09845d_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-13c5f707_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-e467d709_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-8535d245_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-bbd01006_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-674378d1_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-eff32554_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-e2e9ad34_profile cannot be found.
21:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-d1395186_profile cannot be found.
⠦ Deprovisioning ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0/13 0:00:0421:17:51 [ERROR] Error running <lambda>: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-8ad031cf_profile cannot be found.
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/paras/code/skylark/skyplane/cli/cli.py:469 in deprovision │
│ │
│ 466 │ │
│ 467 │ if instances: │
│ 468 │ │ typer.secho(f"Deprovisioning {len(instances)} instances", fg="yellow", bold=True │
│ ❱ 469 │ │ do_parallel(lambda instance: instance.terminate_instance(), instances, desc="Dep │
│ 470 │ else: │
│ 471 │ │ typer.secho("No instances to deprovision", fg="yellow", bold=True) │
│ 472 │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ all = False │ │
│ │ filter_client_id = None │ │
│ │ instances = [ │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-00f31c80229d8a323), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-003c6b1168fe19138), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-03b4f7cf2ba8df60e), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-0440d25860c3fc925), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-05f404ad5438a685e), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-03d0bd145c705cd74), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-0645235e9de0a3886), │ │
│ │ │ AWSServer(region_tag=aws:us-east-1, instance_id=i-0f48a8f42c07aa9a4), │ │
│ │ │ AWSServer(region_tag=aws:us-east-1, instance_id=i-01419491bb9fa8d34), │ │
│ │ │ AWSServer(region_tag=aws:us-east-1, instance_id=i-05eb91c810efa9d47), │ │
│ │ │ ... +3 │ │
│ │ ] │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/paras/code/skylark/skyplane/utils/fn.py:57 in do_parallel │
│ │
│ 54 │ │ │ with ThreadPoolExecutor(max_workers=n) as executor: │
│ 55 │ │ │ │ future_list = [executor.submit(wrapped_fn, args) for args in args_list] │
│ 56 │ │ │ │ for future in as_completed(future_list): │
│ ❱ 57 │ │ │ │ │ args, result = future.result() │
│ 58 │ │ │ │ │ results.append((args, result)) │
│ 59 │ │ │ │ │ progress.update(progress_task, advance=1) │
│ 60 │ if spinner_persist: │
│ │
│ ╭────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ arg_fmt = <function do_parallel.<locals>.<lambda> at 0x13dc98550> │ │
│ │ args_list = [ │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-00f31c80229d8a323), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-003c6b1168fe19138), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-03b4f7cf2ba8df60e), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-0440d25860c3fc925), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-05f404ad5438a685e), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-03d0bd145c705cd74), │ │
│ │ │ AWSServer(region_tag=aws:us-east-2, instance_id=i-0645235e9de0a3886), │ │
│ │ │ AWSServer(region_tag=aws:us-east-1, instance_id=i-0f48a8f42c07aa9a4), │ │
│ │ │ AWSServer(region_tag=aws:us-east-1, instance_id=i-01419491bb9fa8d34), │ │
│ │ │ AWSServer(region_tag=aws:us-east-1, instance_id=i-05eb91c810efa9d47), │ │
│ │ │ ... +3 │ │
│ │ ] │ │
│ │ desc = 'Deprovisioning' │ │
│ │ executor = <concurrent.futures.thread.ThreadPoolExecutor object at 0x168d5cc10> │ │
│ │ func = <function deprovision.<locals>.<lambda> at 0x13dc985e0> │ │
│ │ future = <Future at 0x294928f70 state=finished raised NoSuchEntityException> │ │
│ │ future_list = [ │ │
│ │ │ <Future at 0x168d5c820 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x168d5cb20 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x290cfbb20 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x295712920 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x294928f70 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x2948a7fd0 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x291e4c610 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x29029c190 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x290d86560 state=finished raised NoSuchEntityException>, │ │
│ │ │ <Future at 0x290d241f0 state=finished raised NoSuchEntityException>, │ │
│ │ │ ... +3 │ │
│ │ ] │ │
│ │ n = 13 │ │
│ │ progress = <rich.progress.Progress object at 0x291f1b0a0> │ │
│ │ progress_task = 0 │ │
│ │ results = [] │ │
│ │ return_args = True │ │
│ │ spinner = True │ │
│ │ spinner_persist = True │ │
│ │ t = <skyplane.utils.timer.Timer object at 0x168d5dc30> │ │
│ │ wrapped_fn = <function do_parallel.<locals>.wrapped_fn at 0x13dc98700> │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/concurrent/futures/_base.py:451 in result │
│ │
│ 448 │ │ │ │ if self._state in [CANCELLED, CANCELLED_AND_NOTIFIED]: │
│ 449 │ │ │ │ │ raise CancelledError() │
│ 450 │ │ │ │ elif self._state == FINISHED: │
│ ❱ 451 │ │ │ │ │ return self.__get_result() │
│ 452 │ │ │ │ │
│ 453 │ │ │ │ self._condition.wait(timeout) │
│ 454 │
│ │
│ ╭──── locals ────╮ │
│ │ self = None │ │
│ │ timeout = None │ │
│ ╰────────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/concurrent/futures/_base.py:403 in │
│ __get_result │
│ │
│ 400 │ def __get_result(self): │
│ 401 │ │ if self._exception: │
│ 402 │ │ │ try: │
│ ❱ 403 │ │ │ │ raise self._exception │
│ 404 │ │ │ finally: │
│ 405 │ │ │ │ # Break a reference cycle with the exception in self._exception │
│ 406 │ │ │ │ self = None │
│ │
│ ╭── locals ───╮ │
│ │ self = None │ │
│ ╰─────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/concurrent/futures/thread.py:58 in run │
│ │
│ 55 │ │ │ return │
│ 56 │ │ │
│ 57 │ │ try: │
│ ❱ 58 │ │ │ result = self.fn(*self.args, **self.kwargs) │
│ 59 │ │ except BaseException as exc: │
│ 60 │ │ │ self.future.set_exception(exc) │
│ 61 │ │ │ # Break a reference cycle with the exception 'exc' │
│ │
│ ╭── locals ───╮ │
│ │ self = None │ │
│ ╰─────────────╯ │
│ │
│ /Users/paras/code/skylark/skyplane/utils/fn.py:43 in wrapped_fn │
│ │
│ 40 │ │
│ 41 │ def wrapped_fn(args): │
│ 42 │ │ try: │
│ ❱ 43 │ │ │ return args, func(args) │
│ 44 │ │ except Exception as e: │
│ 45 │ │ │ logger.error(f"Error running {func.__name__}: {e}") │
│ 46 │ │ │ raise │
│ │
│ ╭────────────────────────────────── locals ───────────────────────────────────╮ │
│ │ args = AWSServer(region_tag=aws:us-east-2, instance_id=i-05f404ad5438a685e) │ │
│ │ func = <function deprovision.<locals>.<lambda> at 0x13dc985e0> │ │
│ ╰─────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/paras/code/skylark/skyplane/cli/cli.py:469 in <lambda> │
│ │
│ 466 │ │
│ 467 │ if instances: │
│ 468 │ │ typer.secho(f"Deprovisioning {len(instances)} instances", fg="yellow", bold=True │
│ ❱ 469 │ │ do_parallel(lambda instance: instance.terminate_instance(), instances, desc="Dep │
│ 470 │ else: │
│ 471 │ │ typer.secho("No instances to deprovision", fg="yellow", bold=True) │
│ 472 │
│ │
│ ╭──────────────────────────────────── locals ─────────────────────────────────────╮ │
│ │ instance = AWSServer(region_tag=aws:us-east-2, instance_id=i-05f404ad5438a685e) │ │
│ ╰─────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/paras/code/skylark/skyplane/compute/server.py:169 in terminate_instance │
│ │
│ 166 │ def terminate_instance(self): │
│ 167 │ │ """Terminate instance""" │
│ 168 │ │ self.close_server() │
│ ❱ 169 │ │ self.terminate_instance_impl() │
│ 170 │ │
│ 171 │ def enable_auto_shutdown(self, timeout_minutes=None): │
│ 172 │ │ if timeout_minutes is None: │
│ │
│ ╭────────────────────────────────── locals ───────────────────────────────────╮ │
│ │ self = AWSServer(region_tag=aws:us-east-2, instance_id=i-05f404ad5438a685e) │ │
│ ╰─────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /Users/paras/code/skylark/skyplane/compute/aws/aws_server.py:112 in terminate_instance_impl │
│ │
│ 109 │ │ │ profile = iam.InstanceProfile(profile["Arn"].split("/")[-1]) │
│ 110 │ │ │ │
│ 111 │ │ │ # remove all roles from instance profile │
│ ❱ 112 │ │ │ for role in profile.roles: │
│ 113 │ │ │ │ profile.remove_role(RoleName=role.name) │
│ 114 │ │ │ │
│ 115 │ │ │ # delete instance profile │
│ │
│ ╭──────────────────────────────────── locals ────────────────────────────────────╮ │
│ │ iam = iam.ServiceResource() │ │
│ │ profile = iam.InstanceProfile(name='skyplane-aws-eaae092f_profile') │ │
│ │ self = AWSServer(region_tag=aws:us-east-2, instance_id=i-05f404ad5438a685e) │ │
│ ╰────────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/boto3/resources/factory.py:48 │
│ 5 in get_reference │
│ │
│ 482 │ │ │ # the handler as if it were a response. This allows references │
│ 483 │ │ │ # to have their data loaded properly. │
│ 484 │ │ │ if needs_data and self.meta.data is None and hasattr(self, 'load'): │
│ ❱ 485 │ │ │ │ self.load() │
│ 486 │ │ │ return handler(self, {}, self.meta.data) │
│ 487 │ │ │
│ 488 │ │ get_reference.__name__ = str(reference_model.name) │
│ │
│ ╭─────────────────────────────────── locals ────────────────────────────────────╮ │
│ │ handler = <boto3.resources.response.ResourceHandler object at 0x17e26e7d0> │ │
│ │ needs_data = True │ │
│ │ self = iam.InstanceProfile(name='skyplane-aws-eaae092f_profile') │ │
│ ╰───────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/boto3/resources/factory.py:56 │
│ 4 in do_action │
│ │
│ 561 │ │ │ # We need a new method here because we want access to the │
│ 562 │ │ │ # instance via ``self``. │
│ 563 │ │ │ def do_action(self, *args, **kwargs): │
│ ❱ 564 │ │ │ │ response = action(self, *args, **kwargs) │
│ 565 │ │ │ │ self.meta.data = response │
│ 566 │ │ │ │
│ 567 │ │ │ # Create the docstring for the load/reload methods. │
│ │
│ ╭─────────────────────────────── locals ────────────────────────────────╮ │
│ │ action = <boto3.resources.action.ServiceAction object at 0x17e2166e0> │ │
│ │ args = () │ │
│ │ kwargs = {} │ │
│ │ self = iam.InstanceProfile(name='skyplane-aws-eaae092f_profile') │ │
│ ╰───────────────────────────────────────────────────────────────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/boto3/resources/action.py:88 │
│ in __call__ │
│ │
│ 85 │ │ │ params, │
│ 86 │ │ ) │
│ 87 │ │ │
│ ❱ 88 │ │ response = getattr(parent.meta.client, operation_name)(*args, **params) │
│ 89 │ │ │
│ 90 │ │ logger.debug('Response: %r', response) │
│ 91 │
│ │
│ ╭─────────────────────────────────── locals ────────────────────────────────────╮ │
│ │ args = () │ │
│ │ kwargs = {} │ │
│ │ operation_name = 'get_instance_profile' │ │
│ │ params = {'InstanceProfileName': 'skyplane-aws-eaae092f_profile'} │ │
│ │ parent = iam.InstanceProfile(name='skyplane-aws-eaae092f_profile') │ │
│ │ self = <boto3.resources.action.ServiceAction object at 0x17e2166e0> │ │
│ ╰───────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/botocore/client.py:515 in │
│ _api_call │
│ │
│ 512 │ │ │ │ │ f"{py_operation_name}() only accepts keyword arguments." │
│ 513 │ │ │ │ ) │
│ 514 │ │ │ # The "self" in this scope is referring to the BaseClient. │
│ ❱ 515 │ │ │ return self._make_api_call(operation_name, kwargs) │
│ 516 │ │ │
│ 517 │ │ _api_call.__name__ = str(py_operation_name) │
│ 518 │
│ │
│ ╭─────────────────────────────────── locals ───────────────────────────────────╮ │
│ │ args = () │ │
│ │ kwargs = {'InstanceProfileName': 'skyplane-aws-eaae092f_profile'} │ │
│ │ operation_name = 'GetInstanceProfile' │ │
│ │ py_operation_name = 'get_instance_profile' │ │
│ │ self = <botocore.client.IAM object at 0x291359b40> │ │
│ ╰──────────────────────────────────────────────────────────────────────────────╯ │
│ │
│ /opt/homebrew/Caskroom/miniconda/base/lib/python3.10/site-packages/botocore/client.py:934 in │
│ _make_api_call │
│ │
│ 931 │ │ if http.status_code >= 300: │
│ 932 │ │ │ error_code = parsed_response.get("Error", {}).get("Code") │
│ 933 │ │ │ error_class = self.exceptions.from_code(error_code) │
│ ❱ 934 │ │ │ raise error_class(parsed_response, operation_name) │
│ 935 │ │ else: │
│ 936 │ │ │ return parsed_response │
│ 937 │
│ │
│ ╭─────────────────────────────────────────── locals ───────────────────────────────────────────╮ │
│ │ additional_headers = {} │ │
│ │ api_params = {'InstanceProfileName': 'skyplane-aws-eaae092f_profile'} │ │
│ │ endpoint_url = 'https://iam.amazonaws.com' │ │
│ │ error_class = <class 'botocore.errorfactory.NoSuchEntityException'> │ │
│ │ error_code = 'NoSuchEntity' │ │
│ │ event_response = None │ │
│ │ handler = <function inject_api_version_header_if_needed at 0x13e123b50> │ │
│ │ http = <botocore.awsrequest.AWSResponse object at 0x17f0c0340> │ │
│ │ operation_model = OperationModel(name=GetInstanceProfile) │ │
│ │ operation_name = 'GetInstanceProfile' │ │
│ │ parsed_response = { │ │
│ │ │ 'Error': { │ │
│ │ │ │ 'Type': 'Sender', │ │
│ │ │ │ 'Code': 'NoSuchEntity', │ │
│ │ │ │ 'Message': 'Instance Profile skyplane-aws-eaae092f_profile │ │
│ │ cannot be found.' │ │
│ │ │ }, │ │
│ │ │ 'ResponseMetadata': { │ │
│ │ │ │ 'RequestId': '7f4c5574-4b56-4d76-a34b-80a8cfb1bf6f', │ │
│ │ │ │ 'HTTPStatusCode': 404, │ │
│ │ │ │ 'HTTPHeaders': { │ │
│ │ │ │ │ 'x-amzn-requestid': '7f4c5574-4b56-4d76-a34b-80a8cfb1bf6f', │ │
│ │ │ │ │ 'content-type': 'text/xml', │ │
│ │ │ │ │ 'content-length': '307', │ │
│ │ │ │ │ 'date': 'Tue, 22 Nov 2022 05:17:50 GMT' │ │
│ │ │ │ }, │ │
│ │ │ │ 'RetryAttempts': 0 │ │
│ │ │ } │ │
│ │ } │ │
│ │ request_context = { │ │
│ │ │ 'client_region': 'aws-global', │ │
│ │ │ 'client_config': <botocore.config.Config object at 0x29135b310>, │ │
│ │ │ 'has_streaming_input': False, │ │
│ │ │ 'auth_type': 'v4', │ │
│ │ │ 'signing': {'region': 'us-east-1', 'signing_name': 'iam'}, │ │
│ │ │ 'retries': { │ │
│ │ │ │ 'attempt': 1, │ │
│ │ │ │ 'invocation-id': '70812e64-ff35-47d4-936e-3e0dff558973', │ │
│ │ │ │ 'max': 5 │ │
│ │ │ }, │ │
│ │ │ 'timestamp': '20221122T051750Z' │ │
│ │ } │ │
│ │ request_dict = { │ │
│ │ │ 'url_path': '/', │ │
│ │ │ 'query_string': '', │ │
│ │ │ 'method': 'POST', │ │
│ │ │ 'headers': { │ │
│ │ │ │ 'Content-Type': 'application/x-www-form-urlencoded; │ │
│ │ charset=utf-8', │ │
│ │ │ │ 'User-Agent': 'Boto3/1.26.3 Python/3.10.6 Darwin/22.1.0 │ │
│ │ Botocore/1.29.9 Resource' │ │
│ │ │ }, │ │
│ │ │ 'body': { │ │
│ │ │ │ 'Action': 'GetInstanceProfile', │ │
│ │ │ │ 'Version': '2010-05-08', │ │
│ │ │ │ 'InstanceProfileName': 'skyplane-aws-eaae092f_profile' │ │
│ │ │ }, │ │
│ │ │ 'url': 'https://iam.amazonaws.com/', │ │
│ │ │ 'context': { │ │
│ │ │ │ 'client_region': 'aws-global', │ │
│ │ │ │ 'client_config': <botocore.config.Config object at │ │
│ │ 0x29135b310>, │ │
│ │ │ │ 'has_streaming_input': False, │ │
│ │ │ │ 'auth_type': 'v4', │ │
│ │ │ │ 'signing': {'region': 'us-east-1', 'signing_name': 'iam'}, │ │
│ │ │ │ 'retries': { │ │
│ │ │ │ │ 'attempt': 1, │ │
│ │ │ │ │ 'invocation-id': '70812e64-ff35-47d4-936e-3e0dff558973', │ │
│ │ │ │ │ 'max': 5 │ │
│ │ │ │ }, │ │
│ │ │ │ 'timestamp': '20221122T051750Z' │ │
│ │ │ } │ │
│ │ } │ │
│ │ self = <botocore.client.IAM object at 0x291359b40> │ │
│ │ service_id = 'iam' │ │
│ │ service_name = 'iam' │ │
│ ╰──────────────────────────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
NoSuchEntityException: An error occurred (NoSuchEntity) when calling the GetInstanceProfile operation: Instance Profile skyplane-aws-eaae092f_profile cannot be found.
(base) ~/c/skylark ❯❯❯
Environment info (please complete the following information):
- OS: Mac OS
- Python version: 3.11
Is this still an issue?