metarank
metarank copied to clipboard
LightGBM training on CI fails with gcc error
In RanklensTest, while running only in githib actions CI:
[info] io.github.metarank.lightgbm4j.LGBMException: random_device could not be read
[info] at io.github.metarank.lightgbm4j.LGBMBooster.create(LGBMBooster.java:262)
[info] at io.github.metarank.ltrlib.booster.LightGBMBooster$.apply(LightGBMBooster.scala:39)
[info] at io.github.metarank.ltrlib.booster.LightGBMBooster$.apply(LightGBMBooster.scala:19)
[info] at io.github.metarank.ltrlib.ranking.pairwise.LambdaMART.fit(LambdaMART.scala:50)
[info] at ai.metarank.mode.train.Train$.trainModel(Train.scala:55)
[info] at ai.metarank.e2e.RanklensTest.$anonfun$new$3(RanklensTest.scala:82)
[info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info] at org.scalatest.Transformer.apply(Transformer.scala:22)
The root cause is: https://github.com/actions/virtual-environments/issues/672 So LightGBM native library uses too much enthropy on training, and GHA VM has some issues with randomness sources. Right now we don't train the LGBM model in GHA (and our experiment with pre-defined seed in libltr also failed). But we're investigating further what can be done here.
with the flink removal the issue seems to be somehow fixed. classloader and jni issue?