metarank icon indicating copy to clipboard operation
metarank copied to clipboard

LightGBM training on CI fails with gcc error

Open shuttie opened this issue 4 years ago • 1 comments

In RanklensTest, while running only in githib actions CI:

[info]   io.github.metarank.lightgbm4j.LGBMException: random_device could not be read
[info]   at io.github.metarank.lightgbm4j.LGBMBooster.create(LGBMBooster.java:262)
[info]   at io.github.metarank.ltrlib.booster.LightGBMBooster$.apply(LightGBMBooster.scala:39)
[info]   at io.github.metarank.ltrlib.booster.LightGBMBooster$.apply(LightGBMBooster.scala:19)
[info]   at io.github.metarank.ltrlib.ranking.pairwise.LambdaMART.fit(LambdaMART.scala:50)
[info]   at ai.metarank.mode.train.Train$.trainModel(Train.scala:55)
[info]   at ai.metarank.e2e.RanklensTest.$anonfun$new$3(RanklensTest.scala:82)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)

shuttie avatar Mar 28 '22 18:03 shuttie

The root cause is: https://github.com/actions/virtual-environments/issues/672 So LightGBM native library uses too much enthropy on training, and GHA VM has some issues with randomness sources. Right now we don't train the LGBM model in GHA (and our experiment with pre-defined seed in libltr also failed). But we're investigating further what can be done here.

shuttie avatar Apr 30 '22 10:04 shuttie

with the flink removal the issue seems to be somehow fixed. classloader and jni issue?

shuttie avatar Aug 27 '22 15:08 shuttie