PermutationFeatureImportance output is confusing.

Open lgong-rms opened this issue 5 years ago • 0 comments

I was running the following sample code

https://github.com/dotnet/machinelearning-samples/blob/7b6303a17be61294bd45df8e6e6b029695c0a8be/samples/csharp/end-to-end-apps/Model-Explainability/TaxiFarePrediction/TaxiFarePredictionConsoleApp/Program.cs

which gives the following PFI output

Feature PFI
VendorId            |   -0.105016
RateCode            |   -0.105016
PassengerCount      |   -0.378364
PassengerCount      |   -0.082747
TripTime            |   -0.105016
TripTime            |   -0.105016
TripDistance        |   -0.105016
TripDistance        |   -0.105016
PaymentType         |   -0.107889
FareAmount          |   -0.107410
Label               |   -0.096346
VendorIdEncoded     |   -0.105016
VendorIdEncoded     |   -0.105016
RateCodeEncoded     |   -0.105088
RateCodeEncoded     |   -0.231427
PaymentTypeEncoded  |   -0.691058

the Feature column has duplicated names and rows with the same name even have different PFIs, such as PassengerCount and RateCodeEncoded. I guess it might be due to OneHotEncoding and NormalizeMeanVariance but to end users that seems not making much sense. How should the PFI output be interpreted? Am I missing something obvious? Thanks!

Sep 14 '20 02:09 lgong-rms