sparkxgb icon indicating copy to clipboard operation
sparkxgb copied to clipboard

feature request: Feature Importance of models

Open ElianoMarques opened this issue 5 years ago • 3 comments

would it be possible to link features importance into the package?

ElianoMarques avatar Apr 29 '19 17:04 ElianoMarques

This probably requires to update xgboost4j-spark to at least 0.82 to use Booster.getScore():

https://github.com/dmlc/xgboost/commit/431c850c03edd97a126707d2f04f251838392fe9

sparkxgb currently uses 0.81:

https://github.com/rstudio/sparkxgb/blob/db7a68a3f1b1ce675e05b2c79da4992c4a2f17bf/R/dependencies.R#L11

(I tried to make a PR for this, but couldn't find how to use sparklyr::compile_package_jars() with a dependency jar...)

yutannihilation avatar Oct 21 '19 08:10 yutannihilation

I would try to bump the version in dependencies.R to the newer version, that's about it. To recompile the jars you need to run https://github.com/rstudio/sparkxgb/blob/master/configure.R but I don't think that's required since the Scala code is not changed. Would be great if you could send a PR cause that way we can see what breaks in the tests and help out a bit as well.

javierluraschi avatar Oct 23 '19 05:10 javierluraschi

Thanks!

I don't think that's required since the Scala code is not changed.

But, doesn't inst/java/sparkxgb-*.jar contain all the things including the dependency? If so, changing the dependency should require rebuild. Sorry, I don't know well about Java ecosystem and the structure of a JAR....

yutannihilation avatar Oct 23 '19 06:10 yutannihilation