MLOpsPython icon indicating copy to clipboard operation
MLOpsPython copied to clipboard

The ParallelRunStep does not terminate anymore / no batches are started

Open sigeisler opened this issue 3 years ago • 2 comments

Hi,

I have built upon this project and similarly to your Azure DevOps pipeline my parallel batch scoring pipelines are all not terminating anymore: https://aidemos.visualstudio.com/MLOps/_build/results?buildId=5684&view=logs&j=9effb530-5327-5cf9-9ca2-ba5490ba1ebd

It seems like the actual run(mini_batch) method is never executed.

(I mean you DevOps pipeline is failing as well after 4 hours so I assume you encounter the same issue)

Do you know what's the reason for that?

Thanks!

sigeisler avatar Apr 08 '21 19:04 sigeisler

I'm getting the same problem and I can't find a way to solve it. The ML pipeline starts, all the parallel jobs get created but the mini batches don't do anything and after 55 minutes the process is still running, but no outputs are created. Keen to read Microsoft's response on this.

Sabel5 avatar Apr 16 '21 16:04 Sabel5

Apparently the mini_batch method doesn't work for tabular data. Therefore you should try to replace the run function in the batchscoring script with the following:

def run(input_data) -> pd.DataFrame:
    # prediction
    result = None
    for _, sample in input_data.iterrows():
        # prediction
        pred = model.predict(sample.values.reshape(1, -1))
        result = (
            np.array(pred) if result is None else np.vstack((result, pred))
        )  # NOQA: E501
    return (
            []
            if result is None
            else input_data.join(pd.DataFrame(result, columns=["score"]))
        )

Sabel5 avatar May 13 '21 22:05 Sabel5