EvoTrees.jl icon indicating copy to clipboard operation
EvoTrees.jl copied to clipboard

Boosted trees in Julia

trafficstars

EvoTrees

Build status

A Julia implementation of boosted trees with CPU and GPU support. Efficient histogram based algorithms with support for multiple loss functions (notably multi-target objectives such as max likelihood methods).

R binding available.

Input features are expected to be Matrix{Float64/Float32} when using the internal API. Tables/DataFrames format can be handled through MLJ. See the docs for further details.

Supported tasks

CPU

  • linear
  • logistic
  • Poisson
  • L1 (mae regression)
  • Quantile
  • multiclassification (softmax)
  • Gaussian (max likelihood)

Set parameter device="cpu".

GPU

  • linear
  • logistic
  • Gaussian (max likelihood)

Set parameter device="gpu".

Installation

Latest:

julia> Pkg.add("https://github.com/Evovest/EvoTrees.jl")

From General Registry:

julia> Pkg.add("EvoTrees")

Performance

Data consists of randomly generated float32. Training is performed on 200 iterations. Code to reproduce is here.

EvoTrees: v0.8.4 XGBoost: v1.1.1

CPU: 16 threads on AMD Threadripper 3970X GPU: NVIDIA RTX 2080

Training:

Dimensions / Algo XGBoost Hist EvoTrees EvoTrees GPU
100K x 100 1.10s 1.80s 3.14s
500K x 100 4.83s 4.98s 4.98s
1M x 100 9.84s 9.89s 7.37s
5M x 100 45.5s 53.8s 25.8s

Inference:

Dimensions / Algo XGBoost Hist EvoTrees EvoTrees GPU
100K x 100 0.164s 0.026s 0.013s
500K x 100 0.796s 0.175s 0.055s
1M x 100 1.59s 0.396s 0.108s
5M x 100 7.96s 2.15s 0.543s

MLJ Integration

See official project page for more info.

Getting started using internal API

using EvoTrees

params1 = EvoTreeRegressor(
    loss=:linear, 
    metric=:mse,
    nrounds=100, 
    nbins = 100,
    lambda = 0.5, 
    gamma=0.1, 
    eta=0.1,
    max_depth = 6, 
    min_weight = 1.0,
    rowsample=0.5, 
    colsample=1.0)
model = fit_evotree(params1, X_train, Y_train, X_eval = X_eval, Y_eval = Y_eval, print_every_n = 25)
preds = predict(model, X_eval)

Feature importance

Returns the normalized gain by feature.

features_gain = importance(model, var_names)

Plot

Plot a given tree of the model:

plot(model, 2)

Note that 1st tree is used to set the bias so the first real tree is #2.

Save/Load

EvoTrees.save(model, "data/model.bson")
model = EvoTrees.load("data/model.bson");