EvoTrees.jl
EvoTrees.jl copied to clipboard
Boosted trees in Julia
EvoTrees 
A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions (notably multi-target objectives such as max likelihood methods).
Input features are expected to be Matrix{Float64/Float32} when using the internal API. Tables/DataFrames format can be handled through MLJ. See the docs for further details.
Supported tasks
CPU
- linear
- logistic
- Poisson
- L1 (mae regression)
- Quantile
- multiclassification (softmax)
- Gaussian (max likelihood)
Set parameter device="cpu".
GPU
- linear
- logistic
- Gaussian (max likelihood)
Set parameter device="gpu".
Installation
Latest:
julia> Pkg.add("https://github.com/Evovest/EvoTrees.jl")
From General Registry:
julia> Pkg.add("EvoTrees")
Performance
Data consists of randomly generated float32. Training is performed on 200 iterations. Code to reproduce is here.
EvoTrees: v0.8.4 XGBoost: v1.1.1
CPU: 16 threads on AMD Threadripper 3970X GPU: NVIDIA RTX 2080
Training:
| Dimensions / Algo | XGBoost Hist | EvoTrees | EvoTrees GPU |
|---|---|---|---|
| 100K x 100 | 1.10s | 1.80s | 3.14s |
| 500K x 100 | 4.83s | 4.98s | 4.98s |
| 1M x 100 | 9.84s | 9.89s | 7.37s |
| 5M x 100 | 45.5s | 53.8s | 25.8s |
Inference:
| Dimensions / Algo | XGBoost Hist | EvoTrees | EvoTrees GPU |
|---|---|---|---|
| 100K x 100 | 0.164s | 0.026s | 0.013s |
| 500K x 100 | 0.796s | 0.175s | 0.055s |
| 1M x 100 | 1.59s | 0.396s | 0.108s |
| 5M x 100 | 7.96s | 2.15s | 0.543s |
MLJ Integration
See official project page for more info.
Getting started using internal API
using EvoTrees
params1 = EvoTreeRegressor(
loss=:linear,
metric=:mse,
nrounds=100,
nbins = 100,
lambda = 0.5,
gamma=0.1,
eta=0.1,
max_depth = 6,
min_weight = 1.0,
rowsample=0.5,
colsample=1.0)
model = fit_evotree(params1, X_train, Y_train, X_eval = X_eval, Y_eval = Y_eval, print_every_n = 25)
preds = predict(model, X_eval)
Feature importance
Returns the normalized gain by feature.
features_gain = importance(model, var_names)
Plot
Plot a given tree of the model:
plot(model, 2)

Note that 1st tree is used to set the bias so the first real tree is #2.
Save/Load
EvoTrees.save(model, "data/model.bson")
model = EvoTrees.load("data/model.bson");