SynapseML icon indicating copy to clipboard operation
SynapseML copied to clipboard

[LightGBM] Weight column in LightGBM classifier is not working as per expectation

Open coolcoder001 opened this issue 1 year ago • 2 comments

SynapseML version

2.12:0.9.5

System information

  • Language version : python 3.7, scala 2.12
  • Spark Version: 3.3.0
  • Spark Platform : Databricks

Describe the problem

Hi , I am using LightGBMClassifier for a skewed binary classification problem. I have several features like A, B, C.... so on. I am grouping by the features and computing weights for class 0 and class 1.

However, for testing data I am giving weights as all 1s.

I can see my testing data's loss is not converging. Is this the correct way to use weightCol feature ?

One more observation, while inferencing if I use isUnbalance as True , then the model gives random predictions , AUC comes down to 50%. So, I had to use isUnbalance as False while inferencing. Please let me know if this is the correct behavior.

Code to reproduce issue

params = {'baggingFraction': 0.8156468375795559,
                   'featureFraction': 0.8609557255311693,
                   'featuresCol': 'features',
                   'labelCol': 'label',
                   'learningRate': 0.1449558170049662,
                   'maxDepth': 29,
                   'minSumHessianInLeaf': 0.03753901648224433,
                   'numIterations': 80,
                   'numLeaves': 133,
                   'weightCol': 'weight',
                   'objective': 'binary',
                   'useSingleDatasetMode': True,
                   'isUnbalance': False,
                   'useBarrierExecutionMode': True,
                   'parallelism': 'voting_parallel',
                   'metric': 'auc'
                   }



lgb = LightGBMClassifier(
                             numIterations = params['numIterations'],
                             numLeaves = params['numLeaves'],
                             maxDepth = params['maxDepth'],
                             baggingFraction = params['baggingFraction'],
                             featureFraction = params['featureFraction'],
                             minSumHessianInLeaf = params['minSumHessianInLeaf'],
                             learningRate=params['learningRate'],
                             objective = params['objective'],
                             labelCol = params['labelCol'],
                             featuresCol=params['featuresCol'],
                             weightCol=params['weightCol'],
                             useSingleDatasetMode=True,
                             #isUnbalance=False,
                             useBarrierExecutionMode=True,
                             #parallelism = "voting_parallel",
                             metric = params['metric']
                            )

Other info / logs

No response

What component(s) does this bug affect?

  • [ ] area/cognitive: Cognitive project
  • [ ] area/core: Core project
  • [ ] area/deep-learning: DeepLearning project
  • [X] area/lightgbm: Lightgbm project
  • [ ] area/opencv: Opencv project
  • [ ] area/vw: VW project
  • [ ] area/website: Website
  • [ ] area/build: Project build system
  • [ ] area/notebooks: Samples under notebooks folder
  • [ ] area/docker: Docker usage
  • [ ] area/models: models related issue

What language(s) does this bug affect?

  • [ ] language/scala: Scala source code
  • [X] language/python: Pyspark APIs
  • [ ] language/r: R APIs
  • [ ] language/csharp: .NET APIs
  • [ ] language/new: Proposals for new client languages

What integration(s) does this bug affect?

  • [ ] integrations/synapse: Azure Synapse integrations
  • [ ] integrations/azureml: Azure ML integrations
  • [X] integrations/databricks: Databricks integrations

coolcoder001 avatar May 30 '23 07:05 coolcoder001

Hey @coolcoder001 :wave:! Thank you so much for reporting the issue/feature request :rotating_light:. Someone from SynapseML Team will be looking to triage this issue soon. We appreciate your patience.

github-actions[bot] avatar May 30 '23 07:05 github-actions[bot]

We have released 11.2, which has newer features. We aren't really supporting 0.9.5 anymore, and will release the official 1.0 version soon.

svotaw avatar Jul 17 '23 19:07 svotaw