kernel-memory icon indicating copy to clipboard operation
kernel-memory copied to clipboard

[Feature Request] Handle pipeline failure

Open marcominerva opened this issue 2 months ago • 3 comments

Context / Scenario

Currently, pipeline failures aren't handled at all:

https://github.com/microsoft/kernel-memory/blob/775301a1cdd84ab7a54458d3fa453a8763c0744d/service/Abstractions/Pipeline/DataPipeline.cs#L434-L439

The problem

It is important to keep track of errors that occurs during pipeline exectuion.

Proposed solution

We can add a couple of properties to DataPipeline.cs:

/// <summary>
/// The step that failed, if any.
/// </summary>
public string? FailedStep { get; set; } = null;

/// <summary>
/// The error that caused the pipeline to fail, if any.
/// </summary>
public string? FailureReason { get; set; } = null;

FailureReason can be useful to immediately obtain information about the problem, but it is not strictly necessary to implement this feature.

Then, we need to handle exceptions during pipeline execution, both in InProcessPipelineOrchestrator.cs and in DistributedPipelineOrchestrator.cs.

Finally, after updating the DataPipelineStatus.cs class accordingly, we just need this code:

public DataPipelineStatus ToDataPipelineStatus()
{
    return new DataPipelineStatus
    {
        Completed = this.Complete,
        Failed = this.FailedStep != null,
        Empty = this.Files.Count == 0,
        Index = this.Index,
        DocumentId = this.DocumentId,
        Tags = this.Tags,
        Creation = this.Creation,
        LastUpdate = this.LastUpdate,
        Steps = this.Steps,
        RemainingSteps = this.RemainingSteps,
        CompletedSteps = this.CompletedSteps,
        FailedStep = this.FailedStep,
        FailureReason = this.FailureReason,
    };
}

Importance

would be great to have

marcominerva avatar Apr 24 '24 08:04 marcominerva