Anna Veronika Dorogush
Anna Veronika Dorogush
@cdeterman Sorry for a delay in response. We'll come back to you with more details later.
https://github.com/catboost/catboost/pull/715 - here is a pr that is preparation for CRAN. @ws171913 is working on adding CatBoost to CRAN, please look on the PR and comment if you have any...
It would be great to add it, I'm adding a help wanted tag, because we will not be able to implement it in the close future. But it is easy...
We want the algorithm to work the same way if you train from file or from matrix. All values of categorical feature are treated as strings, including nan value. To...
It's not as obvious as it looks like. The problem here is that when you read your dataset to a data frame in python, then all your 'NaN', 'nan', 'NA'...
https://catboost.ai/docs/concepts/faq.html#why-float-and-nan-values-are-forbidden-for-cat-features - here are the docs about nan and float categorical features. Here is the description of proposed solution, which is a great way to contribute to catboost: https://github.com/catboost/catboost/blob/master/open_problems/open_problems.md (adding...
@szilard actually we did tune the code to run much faster, now we should be faster than xgboost and the same as lightGBM. We are working on more speedups now.
@szilard And we also have implemented GPU training, we compared on Epsilon dataset, and it's 2 times faster than LightGBM and 20 times faster than XGBoost, it would be nice...
If you are running the latest version built from code on github than it is correct. But if I understand correctly you are running benchmarks on airlines dataset - this...
Yes, I forgot about large one-hot-max size - we didn't add it, because it is better in quality to use statistics for cat features with many values. With cat features...