metaflow icon indicating copy to clipboard operation
metaflow copied to clipboard

Run class’s ‘finished’ property stays ‘False’ after failed run

Open cavandervoort opened this issue 3 years ago • 2 comments

Metaflow’s Run class’s ‘finished’ property does not become True for failed Runs that do not reach the ‘end’ step. It would be useful if the Run class had a property that became true once the run terminated, whether or not the run succeeded. The current finished property does not appear to do this (at least in this case).

I expected that ‘finished’ would become ‘True’ when my run failed (using my flow “FailureFlow”, which raises an ‘Exception’ prior to the end step). However, the result was that the finished property remains ‘False’, as seen here (many minutes after the Metaflow UI recognized that the run had failed):

Screen Shot 2022-08-08 at 8 41 40 PM

I am interested to hear others’ thoughts on this.

cavandervoort avatar Aug 09 '22 04:08 cavandervoort

To get things started, here are some potential solutions:

1. Change the ‘finished’ property for Run (and potentially for Step, too) so that it returns ‘True’ in the case of a failed run that will not retry anymore.

  • This could be problematic because it changes code users depend on.

2. Add additional ‘running’ or ‘active’ property for Run (and Step?) that returns a boolean based on whether the Run is in progress or not.

  • This could be difficult because Metaflow may not currently have access to enough information to determine the correct response.
  • A pro is that it won’t affect functionality users depend on.

3. Don’t change Run class and use existing functionality at plugin level.

  • For example, the Argo plugin has an ArgoClient object that can provide users with up-to-date information on run status.

4. Add a LiveRun class to Metaflow that takes in the name of an under-layer (eg, “Argo” or “Kubeflow Pipelines” and a dictionary with info necessary to trigger a run, and returns an object from the under-layer’s plugin.

  • This LiveRun object would have access to information on the run that is current, such as run status. This would help users know whether a run has failed or is still running.
  • This would only work for certain under-layers that have plugins that support this functionality.
  • A benefit to this model is that users would not have to get as familiar with what’s going on behind the scenes to access the information.

cavandervoort avatar Aug 09 '22 04:08 cavandervoort

Follow up discussion at https://outerbounds-community.slack.com/archives/C020U025QJK/p1660018284702399 via chat.metaflow.org. I will post summary after the discussion concludes here.

savingoyal avatar Aug 09 '22 04:08 savingoyal