metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Support running flow (containing @batch steps) locally instead of AWS batch

Open davified opened this issue 3 years ago • 5 comments

Hi there, thank you again for the great work on this library :-)

I understand that the @batch decorator allows us to selectively run some steps locally and some on AWS Batch. (docs)

There are times, however, where I want to run my entire flow locally on a small slice of the dataset to get fast feedback that my flow is still working. Currently, I have to manually comment out the @batch decorator in order to do that.

I was wondering if there's a way for me to tell metaflow to ignore the @batch decorator? For example,

# suggestion: this could ignore the `@batch` decorator and run entire flow locally
python myflow.py run --with local

# this will work as it currently does, and run steps with `@batch` on AWS Batch
python myflow.py run

Happy to hear your thoughts. Thanks again!!

davified avatar Jun 09 '21 01:06 davified

Yes, indeed! You can use the @resources decorator and then use --with batch on CLI - although that will execute the entire flow on AWS Batch. There is an open issue for supporting @local as well. Another alternative is to write a simple Python decorator that can add @batch decorator to your step depending on the presence of an environment variable. I think I have an example handy somewhere - let me dig that up.

savingoyal avatar Jun 09 '21 01:06 savingoyal

#350

savingoyal avatar Jun 09 '21 02:06 savingoyal

Thanks for your prompt response @savingoyal !

Another alternative is to write a simple Python decorator that can add @batch decorator to your step depending on the presence of an environment variable. I think I have an example handy somewhere - let me dig that up. - Could I trouble you to share an implementation of this?

davified avatar Jun 10 '21 00:06 davified

I have done something similar to what @savingoyal suggested that you may find useful @davified :

from metaflow import batch as mf_batch
from metaflow import step

BATCH_LOCAL_MODE_ENV_VAR = 'BATCH_LOCAL_MODE'

def batch(*args, **kwargs):
  if os.environ.get(BATCH_LOCAL_MODE_ENV_VAR, None) == '1':
    sys.stderr.write('@batch operating in local development mode\n')
    return step
  else:
    sys.stderr.write(
        f'@batch operating in remote mode. Set environment variable {BATCH_LOCAL_MODE_ENV_VAR}=1 to switch to '
        'local development mode\n'
    )
    return mf_batch(*args, **kwargs)

You can put this in a file called local_batch.py and it is a drop-in replacement for @batch from metaflow, e.g. all you should need to do is replace

from metaflow import batch

with something like

from local_batch import batch

freespace avatar Jun 14 '21 04:06 freespace

another thread for context and +1 https://outerbounds-community.slack.com/archives/C02116BBNTU/p1651716562792139

tuulos avatar May 05 '22 02:05 tuulos