machinelearning-samples
machinelearning-samples copied to clipboard
PermutationFeatureImportance output is confusing.
I was running the following sample code
https://github.com/dotnet/machinelearning-samples/blob/7b6303a17be61294bd45df8e6e6b029695c0a8be/samples/csharp/end-to-end-apps/Model-Explainability/TaxiFarePrediction/TaxiFarePredictionConsoleApp/Program.cs
which gives the following PFI output
Feature PFI
VendorId | -0.105016
RateCode | -0.105016
PassengerCount | -0.378364
PassengerCount | -0.082747
TripTime | -0.105016
TripTime | -0.105016
TripDistance | -0.105016
TripDistance | -0.105016
PaymentType | -0.107889
FareAmount | -0.107410
Label | -0.096346
VendorIdEncoded | -0.105016
VendorIdEncoded | -0.105016
RateCodeEncoded | -0.105088
RateCodeEncoded | -0.231427
PaymentTypeEncoded | -0.691058
the Feature column has duplicated names and rows with the same name even have different PFIs, such as PassengerCount and RateCodeEncoded. I guess it might be due to OneHotEncoding and NormalizeMeanVariance but to end users that seems not making much sense. How should the PFI output be interpreted? Am I missing something obvious? Thanks!