Ilya Matiach

Results 261 comments of Ilya Matiach

"Also it not clear what the difference between [areaUnderROC](https://github.com/microsoft/SynapseML/blob/383cb951811908fe29b85253edfd8dffb9b2241c/core/src/main/scala/com/microsoft/azure/synapse/ml/core/metrics/MetricConstants.scala#L19) (MetricConstants.AreaUnderROCMetric) and [AUC](https://github.com/microsoft/SynapseML/blob/383cb951811908fe29b85253edfd8dffb9b2241c/core/src/main/scala/com/microsoft/azure/synapse/ml/core/metrics/MetricConstants.scala#L20) (MetricConstants.AucSparkMetric) as [getAUC](https://github.com/microsoft/SynapseML/blob/be4965858d9fd11355bd284010ea51f2ccdc55c9/core/src/main/scala/com/microsoft/azure/synapse/ml/train/ComputeModelStatistics.scala#L381) internally calls [areaUnderROC](https://github.com/microsoft/SynapseML/blob/be4965858d9fd11355bd284010ea51f2ccdc55c9/core/src/main/scala/com/microsoft/azure/synapse/ml/train/ComputeModelStatistics.scala#L394) from spark.mllib.evaluation.BinaryClassificationMetrics." Yes, indeed, they are synonymous. "Hmm, getAUC calls areaUnderROC not...

@alzio2607 could you please send a notebook that reproduces the issue? "The predictions on the pandas df should match with the predictions on spark df in mmlSpark." I agree that...

@BrianMiner currently lightgbm loads all data into both java and native memory, but @svotaw is working on implementing a streaming mode which will allow lightgbm to stream the java data...

Also, it is actually possible that you are running into some other error than memory error. I would recommend to try the latest code. You can also find the error...

I would know this much better if I could see the cluster logs. Note the build above is just latest master, it doesn't yet include the new optimizations. I wrote...

Interesting, it might just be an issue with how much memory you have assigned to executors. Maybe you have little memory assigned to each executor, hence this would explain that...

@BrianMiner no, sorry. @svotaw is working on this, he was running into some seg faults that he just fixed at end of last week, and he is on summer vacation...

yes, it looks like a thread concurrency bug/race condition in glibc, which was fixed in 2021, so I wonder if upgrading might fix it: related issue: https://github.com/puppeteer/puppeteer/issues/2207 same link as...

@glogowski-wojciech great catch. Yes, that needs to be changed to self.model instead of model. I'll send a PR for the fix.

@Shafi2016 it looks like you have some empty partitions in your spark dataframe. That is just a warning though. The problem is that you are running out of memory, based...