machinelearning icon indicating copy to clipboard operation
machinelearning copied to clipboard

PFI (Permutation Feature Importance) takes forever on some regression models created using AutoML

Open lolsoftware opened this issue 1 year ago • 3 comments

System Information

  • Windows 11
  • ML.NET v4.0.0
  • .NET 8.0

Describe the bug Calculating PFI seems to hang (take forever) for some regression models created using AutoML. For other models (created against the same learning data) calculating PFI takes just a couple of seconds.

To Reproduce Run the attached program.

Expected behavior I would expect PFI to complete in a timely fashion. Or at least provide a mechanism to monitor the progress of PFI calculation with the ability to cancel it.

Screenshots, Code, Sample Projects The attached program contains two regression models created using AutoML, and a set of data used to create the two models. The program first calculates PFI against the first model which takes just a couple of seconds. Then the program tries to calculate PFI against the second model which takes forever. test project.zip

lolsoftware avatar Dec 17 '24 15:12 lolsoftware

This issue comes from the number of features being given to the model. The model that runs fast looks like this Image

the model that doesn't looks like this Image

and since the runtime for PFI is directly related to the number of features..... The real question is why did AutoML generate that many features? Is the accuracy the same between the models? @LittleLittleCloud any ideas why AutoML would generate that many features? @lolsoftware would you be able to share exactly how you made that second model? Was it with the data included in the test project you uploaded? If so, could you share the repro code?

michaelgsharp avatar Mar 10 '25 08:03 michaelgsharp

Please find attached the project that was used to create both models. The Learner routine and the PFI routine use the same data file.

TestLearner.zip

lolsoftware avatar Mar 10 '25 11:03 lolsoftware

@LittleLittleCloud could you take a look into this with their project and see why AutoML might be creating that many features?

michaelgsharp avatar Mar 17 '25 05:03 michaelgsharp