SynapseML
SynapseML copied to clipboard
Lightgbm ValidationIndicatorCol - which are values exactly in this str Col
Hi, folks!
validationIndicatorCol (str): Indicates whether the row is for training or validation
def setValidationIndicatorCol(value: String): this.type = set(validationIndicatorCol, value) }
Is it means just string col with two values "training" or "validation" litteraly?
Originally posted by @whiteneverdie in https://github.com/Azure/mmlspark/issues/689#issuecomment-644167974
👋 Thanks for opening your first issue here! If you're reporting a 🐞 bug, please make sure you include steps to reproduce it.
@whiteneverdie great question, and sorry about the confusion. Yes, this should just be the name of the column: https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/core/contracts/Params.scala#L179
The column itself should just contain booleans, we filter it here: https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMBase.scala#L175
Note there is an issue with having huge validation sets, hopefully that's not the case for your scenario: https://github.com/Azure/mmlspark/issues/689
Oh, actually, it looks like you commented on that issue already, sorry I didn't notice your comment there.
booleans value in ValidationIndicatorCol, emm. But where can we set what kind of metric for our validation? @imatiach-msft thanks!
@JWenBin the metric for validation can be set here, using setMetric: https://github.com/Azure/mmlspark/blob/master/src/main/scala/com/microsoft/ml/spark/lightgbm/LightGBMParams.scala#L310
Hi~ I have a question.Why the validate data participate in model training when using setValidationIndicatorCol,Looking forward to your reply.
So, i think in the code should describe more clearly, such as:
- The param contain column name
- That column type is boolean: true if that row is validate set, false if that row is training set