NimbusML Lambda ranker does not throw error when labels are strings as oppose to ordinal int.

Lambda ranker does not throw error when labels are strings as oppose to ordinal int.

Open dataninjia opened this issue 6 years ago • 1 comments

My team is currently using LightGBMrank through nimbus for some ranking problems. However, we are a bit confused about the data type required for the label column – I couldn’t find too much documentation on this.

I tried a few iteration based off of the default example given in the LightGBMrank documentation, which had ordinal labels. Here are the iterations I tried:

The default, with ordinal labels
Changed data input to a data frame to make sure output is the same. It is.
Remapped labels to str format {0: “Bad”, 1: “Fair”, 2: “Good”, 3: “Excellent”}.
Remapped the ordering, and added a random label “Goofy”

The results of these 4 on NDCG are different, and none broke the classifier.

The ipython notebook attached has code to reproduce the issue.

lambdaRankTest.zip

Thanks, Mike

Feb 15 '19 21:02 dataninjia

My own opinion to NimbusML effort is that this is a case where the effort to be "helpful" in the API has backfired. While in, say, multiclass classification the order in which classes are assigned is unimportant, in the case of ranking it is really important. My own first thought is that ranking should desist from trying to "help" in this manner, and should instead, if someone feeds in an inappropriate type (like a string), offer some suitably prescriptive advice on what they should do to map it to an appropriate type, rather than trying to "guess," which will almost certainly result in undesirable consequences.

Feb 16 '19 02:02 TomFinley

NimbusML NimbusML copied to clipboard

Lambda ranker does not throw error when labels are strings as oppose to ordinal int.

NimbusML
NimbusML copied to clipboard