xgboost-distribution
xgboost-distribution copied to clipboard
How to contribute?
Hi @CDonnerer,
Thanks for a great package!
I am bit in doubt about how to contribute to your package; I have e.g. never used tox before. As such, I couldn't get the tests to pass when I tried in my own fork.
Also, I am personally more used to Poetry, but that shouldn't be too much of an issue.
However, I needed a bit more functionality in your package, so I have added a few things:
-
First of all I have renamed
params
totransformed_params
to make it more clear where the parameters are transformed and where they are not. I have also added the helper functionpredict_transformed_params
whichpredict
now depends on. -
I also needed quantile predictions a bit more easily accessible. As such, I have included the function
predict_quantiles
and added the functionpredict_quantiles
to all of the distributions (and the base). See eg. here forNegativeBinimial
. -
While adding the above, I also added access to XGBoost's own cross-validation method
cv
.
All tests pass, however, I should probably also add few tests of my own (in addition to the added cv
example).
Please, let me know how you prefer contributions so that I add the above changed :)
Hi, Unfortunately, at the moment I do not have guidelines for contributions. This may change in the future, but right now the best way is to raise issues for feature requests / bug fixes.
In terms of the above:
- What is the usecase for exposing the raw parameters? You could already do this by accessing the underlying booster object.
- For the quantiles & cv methods, in general this package tries to closely follow the scikit-learn xgboost API, which makes those fit less well. I'd argue that quantiles should belong in your application code, as the distribution you get is already fully specified, so you can derive any quantities you want from it. On the cv side, it'd be good to know if this is offers benefits over the "regular" scikit learn framework.
Well, it was also just a proposal in case you wanted more external contributors.
- Well, the main reason is just to make it easier for users to check them in case they wanted. Also, this leads to more code reuse, at least if one also uses the quantile-functions.
2a. Personally, I do not see an issue adding additional functions, as long as the scikit-learn API is otherwise followed. Being able to predict quantiles was one of the things about XGBoostLSS that I really liked and thought that your package was missing. Currently, to get quantiles, one has to know about the specific likelihood distribution and manually implement the scipy stats code. I believe that adding this to the package greatly improve the user experience.
2b. Regarding the cv-part, I needed more advanced hyperparameter optimization strategies than the simple ones that Scikit-Learn allows (Grid search and Random search), in particular I prefer Bayesian Optimization. Using this, the cv-method from XGBoost is really helpfull and allows using custom objective and metric functions, missing and categorical data, weights and base margins, and so on.