metaflow
metaflow copied to clipboard
Add a way to identify or clean up AWS resources
When deploying into production we generate AWS resources using:
python <workflow>.py --with retry step-functions create
The most basic workflow:
from metaflow import FlowSpec, step
class TestFlow(FlowSpec):
@step
def start(self) -> None:
print("hello world!")
self.next(self.end)
@step
def end(self) -> None:
pass
if __name__ == "__main__":
TestFlow()
creates a AWS Batch job definition (metaflow_<some-hash>
) and a AWS Step Function state machine with the same name as the FlowSpec
class (TestFlow
in this case). All of these resources are created transparently and (as far as I can tell) there is no way to track them in Metaflow or through AWS APIs. This makes it difficult to maintain visibility on all of the resources being created by users and to eventually go back and clean them up if necessary.
Either AWS resources created by Metaflow commands should be tagged with an AWS tag that can logically group resources together (even better if the user can specify them), and/or the step-functions
subgroup command should provide a way to destroy created resources.
I'm personally more in favor of the former because it would also allow costs to be tracked along with identifying resources.
Makes sense. Although not all resources can be tagged by Metaflow - for example, the underlying ECS instances used by AWS Batch can't be tagged because they will time-share with other applications.
Any updates on this?
Interested in this as well...we track all our resources with tags and it's a big limitation that we can't use them on deployed step functions. Also, there should be a straightforward way to destroy all resources that were created when deploying a step function
metaflow sets tags on all AWS Batch jobs which AWS takes into account for cost reporting.