metaflow
metaflow copied to clipboard
Metaflow takes a lot of time if there are a lot of files in the current directory
I was wondering what is the significance of walking the current directory.
https://github.com/Netflix/metaflow/blob/5186c6c5bba36d9e77077413ee2495dc79da3dca/metaflow/package.py#L81-L92
I was trying to run a flow from my home directory and a simple hello world flow was taking too long because I have a lot of subfolders and files in the home directory.
Is it really necessary to walk the present directory or is there a workaround for this?
Currently, the only way seems to be creating separate folders for each flow.
@agrawalsourav98 We walk your current directory to capture a snapshot of your code - you can access it at any later point - Run('MyFlow/42').code.tarball
Okay, investicated a little more and this should not take a lot of time. The reason its taking a lot of time is I have my anaconda working directory in the present directory.
Can a flag be added that ignores certain directories while walking? Folders like anaconda working directory have a ton of '.py' files that are simply unrelevant to the flow. I know it sounds dumb but anaconda is just for example.