NimbusML icon indicating copy to clipboard operation
NimbusML copied to clipboard

Lambda ranker does not throw error when labels are strings as oppose to ordinal int.

Open dataninjia opened this issue 5 years ago • 1 comments

My team is currently using LightGBMrank through nimbus for some ranking problems. However, we are a bit confused about the data type required for the label column – I couldn’t find too much documentation on this.

I tried a few iteration based off of the default example given in the LightGBMrank documentation, which had ordinal labels. Here are the iterations I tried:

  1. The default, with ordinal labels
  2. Changed data input to a data frame to make sure output is the same. It is.
  3. Remapped labels to str format {0: “Bad”, 1: “Fair”, 2: “Good”, 3: “Excellent”}.
  4. Remapped the ordering, and added a random label “Goofy”

The results of these 4 on NDCG are different, and none broke the classifier.

The ipython notebook attached has code to reproduce the issue.

lambdaRankTest.zip

Thanks, Mike

dataninjia avatar Feb 15 '19 21:02 dataninjia

My own opinion to NimbusML effort is that this is a case where the effort to be "helpful" in the API has backfired. While in, say, multiclass classification the order in which classes are assigned is unimportant, in the case of ranking it is really important. My own first thought is that ranking should desist from trying to "help" in this manner, and should instead, if someone feeds in an inappropriate type (like a string), offer some suitably prescriptive advice on what they should do to map it to an appropriate type, rather than trying to "guess," which will almost certainly result in undesirable consequences.

TomFinley avatar Feb 16 '19 02:02 TomFinley