xgboost-distribution icon indicating copy to clipboard operation
xgboost-distribution copied to clipboard

How to contribute?

Open ChristianMichelsen opened this issue 1 year ago • 2 comments

Hi @CDonnerer,

Thanks for a great package!

I am bit in doubt about how to contribute to your package; I have e.g. never used tox before. As such, I couldn't get the tests to pass when I tried in my own fork.

Also, I am personally more used to Poetry, but that shouldn't be too much of an issue.

However, I needed a bit more functionality in your package, so I have added a few things:

  1. First of all I have renamed params to transformed_params to make it more clear where the parameters are transformed and where they are not. I have also added the helper function predict_transformed_params which predict now depends on.

  2. I also needed quantile predictions a bit more easily accessible. As such, I have included the function predict_quantiles and added the function predict_quantiles to all of the distributions (and the base). See eg. here for NegativeBinimial.

  3. While adding the above, I also added access to XGBoost's own cross-validation method cv.

All tests pass, however, I should probably also add few tests of my own (in addition to the added cv example).

Please, let me know how you prefer contributions so that I add the above changed :)

ChristianMichelsen avatar Feb 15 '23 10:02 ChristianMichelsen

Hi, Unfortunately, at the moment I do not have guidelines for contributions. This may change in the future, but right now the best way is to raise issues for feature requests / bug fixes.

In terms of the above:

  1. What is the usecase for exposing the raw parameters? You could already do this by accessing the underlying booster object.
  2. For the quantiles & cv methods, in general this package tries to closely follow the scikit-learn xgboost API, which makes those fit less well. I'd argue that quantiles should belong in your application code, as the distribution you get is already fully specified, so you can derive any quantities you want from it. On the cv side, it'd be good to know if this is offers benefits over the "regular" scikit learn framework.

CDonnerer avatar Feb 20 '23 20:02 CDonnerer

Well, it was also just a proposal in case you wanted more external contributors.

  1. Well, the main reason is just to make it easier for users to check them in case they wanted. Also, this leads to more code reuse, at least if one also uses the quantile-functions.

2a. Personally, I do not see an issue adding additional functions, as long as the scikit-learn API is otherwise followed. Being able to predict quantiles was one of the things about XGBoostLSS that I really liked and thought that your package was missing. Currently, to get quantiles, one has to know about the specific likelihood distribution and manually implement the scipy stats code. I believe that adding this to the package greatly improve the user experience.

2b. Regarding the cv-part, I needed more advanced hyperparameter optimization strategies than the simple ones that Scikit-Learn allows (Grid search and Random search), in particular I prefer Bayesian Optimization. Using this, the cv-method from XGBoost is really helpfull and allows using custom objective and metric functions, missing and categorical data, weights and base margins, and so on.

ChristianMichelsen avatar Feb 21 '23 08:02 ChristianMichelsen