metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Metaflow takes a lot of time if there are a lot of files in the current directory

Open agrawalsourav98 opened this issue 3 years ago • 2 comments

I was wondering what is the significance of walking the current directory.

https://github.com/Netflix/metaflow/blob/5186c6c5bba36d9e77077413ee2495dc79da3dca/metaflow/package.py#L81-L92

I was trying to run a flow from my home directory and a simple hello world flow was taking too long because I have a lot of subfolders and files in the home directory.

Is it really necessary to walk the present directory or is there a workaround for this?

Currently, the only way seems to be creating separate folders for each flow.

agrawalsourav98 avatar Mar 10 '22 17:03 agrawalsourav98

@agrawalsourav98 We walk your current directory to capture a snapshot of your code - you can access it at any later point - Run('MyFlow/42').code.tarball

savingoyal avatar Mar 10 '22 18:03 savingoyal

Okay, investicated a little more and this should not take a lot of time. The reason its taking a lot of time is I have my anaconda working directory in the present directory.

Can a flag be added that ignores certain directories while walking? Folders like anaconda working directory have a ton of '.py' files that are simply unrelevant to the flow. I know it sounds dumb but anaconda is just for example.

agrawalsourav98 avatar Mar 11 '22 03:03 agrawalsourav98