DecisionTree.jl icon indicating copy to clipboard operation
DecisionTree.jl copied to clipboard

How to plot a decision tree (using a graphics package)

Open roland-KA opened this issue 2 years ago • 57 comments

Is there a possibility to plot a decision tree using Plots.jl (or some other graphics package)?

I'm using MLJ and the only means to visualize a decision tree seems to be report(mach).print_tree(n) where mach is the trained machine.

If there is no such possibility: How can I access the tree (data structure) directly in MLJ?

roland-KA avatar Jan 19 '22 19:01 roland-KA

I think I've solved most of the questions above: Using AbstractTrees.jl and GraphRecipes.jl it's relatively easy to implement.

  • The decision tree can be accessed via fitted_params(mach).tree.
  • Then the AbstractTrees functions children and print node have to be implemented:
using AbstractTrees

function AbstractTrees.children(node::DecisionTree.Node)
	return(node.left, node.right)
end

function AbstractTrees.printnode(io::IO, node::DecisionTree.Node)
	print(io, "ID: ", node.featid, " - ", node.featval)
end

function AbstractTrees.printnode(io::IO, leaf::DecisionTree.Leaf)
	print(io, "maj: ", leaf.majority, " - vals: ", length(leaf.values))
end
  • Finally GraphRecipes can be used together with Plots (dtree is the decision tree to be plotted):
using GraphicRecipes

plot(TreePlot(dtree), method = :tree, nodeshape = :ellipse)

Unfortunately DecisionTree.Node stores only the id of the feature which is used for a split (featid). It would be nice, if the feature name could also be shown. Within the DecisionTree package exists an array feature_names with these names. But I didn't find a way to access it. How can this be done (in printnode)?

roland-KA avatar Jan 19 '22 22:01 roland-KA

Hi @roland-KA, I have created a function to plot decision tree using CairoMakie (https://github.com/Rahulub3r/Julia-ML-Utils/blob/main/decisionTreeUtils.jl). Although this is a minimum working example, I think it can be changed to be production material for integrating into the package. I am interested in contributing. Let me know if this looks good.

Rahulub3r avatar Feb 07 '22 23:02 Rahulub3r

Hi @Rahulub3r , thank you for your efforts! Could you perhaps give a working example on how to call drawTree? I.e. which initial values should be used to draw a tree from its root down to the leafs?

roland-KA avatar Feb 10 '22 12:02 roland-KA

Sure. An MWE is as follows. Suggestions are appreciated.

using MLJ
using DecisionTree
using CairoMakie
using Random

X, y = make_blobs(300;rng=MersenneTwister(1234))

dtc = @load DecisionTreeClassifier pkg=DecisionTree verbosity=0
dtc_model = dtc(min_purity_increase=0.005, min_samples_leaf=1, min_samples_split=2, max_depth=3)
dtc_mach = machine(dtc_model, X, y)
MLJ.fit!(dtc_mach)
x = fitted_params(dtc_mach)
#print_tree(x.tree)

f = Figure(;resolution=(1000, 800))
ax1 = Axis(f[1,1])
drawTree(x.tree, x.encoding, ax1; feature_names=["X1", "X2"], 
        nodetextsize=20, nodetextcolor=:black, nodewth=12,
        linetextsize=13, leaftextsize=13, leafwth=4)
hidespines!(ax1)
hidedecorations!(ax1)
f

image

Rahulub3r avatar Feb 10 '22 14:02 Rahulub3r

Thank's, that looks really good! 👍

roland-KA avatar Feb 10 '22 17:02 roland-KA

Very cool! I wonder what the best way to integrate this contribution might be. Be great for MLJ users to be able to do this (without adding Makie as dependency to existing MLJ packages).

@Rahulub3r Any interest in working helping out with visualisation in MLJ more generally?

Probably missing something but why is it necessary to specify feature names? Are they not part of x.encoding?

ablaom avatar Feb 13 '22 21:02 ablaom

Yes, I am interested. However, I am not sure how we would implement them without Makie. RE feature names, if you do not specify them, they will be shown as Feature 1, Feature 2, .. in the plot, but if you specify the names, you will see them as shown in the plot above. x.encoding contains the class label information and not the features.

Rahulub3r avatar Feb 13 '22 23:02 Rahulub3r

x.encoding contains the class label information and not the features.

Yes, of course you right 👍🏾 . Perhaps we could get the MLJ interface to expose the feature names (as we do in, eg, the MLJGLMInterface).

ablaom avatar Feb 14 '22 00:02 ablaom

As far as I can see, we have now three options to visualize a decision tree:

  1. Using GraphRecipes (with Plots in the background and the decision tree 'extended' to an AbstractTree) as described in my post above.
  2. The code developed by @Rahulub3r, which implements its own tree layout algorithm using Makie.jl.
  3. Using Graphs (with GraphMakie the background).

No. 2 is without doubt visually the most beautiful solution. But I think we should also consider some software engineering aspects:

  • Maintenance: I think it is better to build on existing efforts to keep the own efforts for development and maintenance low (our resources are scarce). No. 1 & 3 only need some 'translation' of the decision tree to another data structure and use then ready made layout and plotting algorithms.
  • Loose coupling: As @ablaom mentioned, there shouldn't be a direct dependency between MLJ and a graphics package. With option 1 every package capable of plotting an AbstractTree will do the job and with option 3 the same holds for every package capable of plotting a Graph. So we have a decoupling using these abstract data types. And of course it would be helpful, if MLJ could provide the class labels as well as the feature names through a clear interface. That's a good idea @ablaom!

I've been digging a bit deeper into options 1 & 3 recently, so some more comments on that:

Option 1: Using class labels and attribute names (from 'somewhere', ideally from an MLJ interface 😊), trees with all that information can be plotted using the means stated above.

BTW: I've started a three-part tutorial on 'Towards Data Science' on how Julia and its ecosystem can be used for ML (for an audience that doesn't know Julia). The first part has been published last week. The third part uses exactly no. 1 as an example to show how easy it is in Julia to 'glue' several packages together and create a new solution with only little coding. So there will be an extensive description on how to do it.

On that way, I identified an issue with the TreePlot recipe (JuliaPlots/GraphRecipes.jl#172), which has to be resolved, before we can use it to produce correct tree plots.

Option 3: In order to get a 'beautiful' tree plot, some extra work is needed as the standard output from GraphMakie is a bit basic and doesn't offer more out-of-the-box as I a learned here: JuliaPlots/GraphMakie.jl#57

I can provide a code example on how to apply this option next week (I'm a bit busy this week).

roland-KA avatar Feb 17 '22 11:02 roland-KA

@roland-KA Very nice summary of the issues and best options. I agree that while @Rahulub3r code is a lovely bit of work, a more generic solution is preferred (1 or 3).

It probably makes sense to have a separate "MLJPlotRecipes" package for plotting things that are very specific to MLJ (we have, for example, recipe in MLJTuning to plot outcomes of hyper-parameter optimization). But this wouldn't fall into this category.

I've opened https://github.com/JuliaAI/MLJDecisionTreeInterface.jl/issues/13. This should be pretty easy (PR welcome 😄 ).

Makie.jl or Plots.jl? That's a very difficult question. I'm waiting for someone to convince me one way or the other 😉 . Of course one could do both, starting with Plots.jl which seems more ready to roll.

See also the MLJ visualization project in this GSoC list.

ablaom avatar Feb 18 '22 01:02 ablaom

I think a separate "MLJPlotRecipes" package is a good idea, as there are surely more models that could be visualized (and it would be another advantage of MLJ over similar ML-packages in general).

As a first step, an implementation for Plots.jl would be preferable in my opinion too, because:

  • Plots.jl is more mature in comparison. In Makie.jl there are still quite a few loose ends and a lot of changes going on.
  • The maintainer of GraphMakie.jl informed me, that he is planning to extend that package with the functionality we need (see: JuliaPlots/GraphMakie.jl#57). So it hasn't to implemented for decision trees in MLJPlotRecipes.

... and if time permits, I will have a look at JuliaAI/MLJDecisionTreeInterface.jl#13 😊

roland-KA avatar Feb 23 '22 12:02 roland-KA

And as promised, here the code to plot a decision tree using Graphs and GraphMakie. This variant needs a bit more coding than GraphRecipies & Plots since Graphs expects the tree in a breadth-first order whereas the structures of the decision tree are better suited for a depth-first traversal.

So I'm doing basically a depth-first traversal of the decision tree but I collect the nodes (as well as the information for the labels) in a breadth-first order. Sounds a bit strange (and is a bit strange 😀), but can be done with relatively little effort.

The trick is the use of the counter dictionary below, which has a counter for each level of the (binary) tree and delivers a numbering of the nodes in breadth-first order (1st level starts with 1, 2nd level with 2, 3rd level with 4 etc.).

The function traverse_tree visits each node and adds on this visit label information to label and adds the edge to its left and right child to the graph g (in the call to add_node). We assume that the decision tree is in tree and has been created beforehand.

using Graph, GraphMakie, CairoMakie

depth = DecisionTree.depth(tree)
g = SimpleDiGraph(2^(depth+1)-1)  # graph (for building the tree) of size given as an argument
label = Dict{Int, String}()  # for collecting label information
counter = Dict([i => 2^i for i in 0:depth])

function traverse_tree(tree::DecisionTree.Node, level::Int, cnt::Dict{Int, Int}, g::SimpleDiGraph, label::Dict{Int, String})
	label[cnt[level]] = "ID = " * string(tree.featid) * " - " * string(tree.featval)
	add_node!(g, tree.left, level, cnt, label) 	# left child
	add_node!(g, tree.right, level, cnt, label) # right child
end

traverse_tree(tree::DecisionTree.Leaf, level::Int, cnt::Dict{Int, Int}, g::SimpleDiGraph, label::Dict{Int, String}) = 
	label[cnt[level]] = "maj = " * string(tree.majority)

function add_node!(g::SimpleDiGraph, node::Union{DecisionTree.Node, DecisionTree.Leaf}, level::Int, cnt::Dict{Int, Int}, label::Dict{Int, String})
	add_edge!(g, cnt[level], cnt[level+1])
	traverse_tree(node, level + 1, cnt, g, label)
	cnt[level+1] += 1
end

The function traverse_tree has to be applied to the initial data structures defined above and a level of 0. Afterwards we have a tree structure in g similar to the decision tree and all label information in label. The keys of label give us the (breadth-first) order we need for plotting. Therefore we have to extract it in that order (call to sort) and then everything can be plotted using graphplot.

traverse_tree(tree, 0, counter, g, label)
tree_labels = collect(values(sort(label)))
f, ax, p = graphplot(g, layout = layout, nlabels = tree_labels, nlabels_attr=(;justification = :center, align = (:center, :top), color = :blue))

The resulting tree plot is the one depicted in JuliaPlots/GraphMakie.jl#57.

roland-KA avatar Feb 23 '22 13:02 roland-KA

Before noticing this issue, I went ahead and adapted the Plots recipe used in EvoTrees for this package. It's rather hacky, because the node shapes don't play well with the text, especially for deep trees, but between custom markers and bypassing the labels with annotations, it mostly works out. I ended up doing basically the same node traversal to get a digraph. Let me know if you think its worth making a pull request with some modifications (either here or in some eventual MLJPlotRecipes).

using RecipeBase
@recipe function plot(tree::DecisionTree.Node, var_names::Vector{Symbol}; widthAdjust=0.8, nodeWidth=0.45, falseColor="#FFC7AC", trueColor="#DAFBAA", decisionColor="#D6CCFC")
    g, reorderedNames, reorderedValues, reorderedLabels, n = buildGraph(tree, var_names)
    adjList = g.fadjlist
    size_base = floor(log2(length(adjList)))
    sz = (128 * 2^(size_base * widthAdjust), 96 * (1 + size_base))
    xBuf, yBuf = (nodeWidth, 0.45)
    buch = buchheim(adjList)
    xBuch = [x[1] for x in buch]
    yBuch = [x[2] for x in buch]
    shapes = [[(xBuch[ii] - xBuf, yBuch[ii] + yBuf), (xBuch[ii] + xBuf, yBuch[ii] + yBuf), (xBuch[ii] + xBuf, yBuch[ii] - yBuf - 0.15), (xBuch[ii] - xBuf, yBuch[ii] - yBuf - 0.15)] for ii = 1:length(xBuch)]
    curves = Vector{Tuple{Vector{Float64},Vector{Float64}}}(undef, sum(length.(adjList)))
    iCurve = 1
    for ii = 1:length(reorderedLabels)
        for jj = 1:length(adjList[ii])
            curves[iCurve] = ([xBuch[ii], xBuch[adjList[ii][jj]]], [shapes[ii][3][2], shapes[adjList[ii][jj]][1][2]])
            iCurve += 1
        end
    end
    annotate = [(xBuch[ii], yBuch[ii], reorderedLabels[ii], 9) for ii = 1:length(reorderedLabels)]
    background_color --> :white
    linecolor --> :black
    legend --> nothing
    axis --> nothing
    framestyle --> :none
    size --> sz
    annotations --> annotate
    for ii = 1:length(shapes)
        @series begin
            if length(adjList[ii]) == 0
                if reorderedValues[ii] > 0.5
                    fillColor = trueColor
                else
                    fillColor = falseColor
                end
            else
                fillColor = decisionColor
            end
            fillcolor --> fillColor
            seriestype --> :shape
            return shapes[ii]
        end
    end
    for ii = 1:length(curves)
        @series begin
            seriestype --> :curves
            return curves[ii]
        end
    end
end
function buildGraph(tree::DecisionTree.Node, givenNames)
    g = SimpleDiGraph()
    reorderedNames = Vector{String}()
    reorderedValues = tuple()
    reorderedLabels = Vector{String}()
    g, reorderedNames, reorderedValues, reorderedLabels, n = addNode(tree, g, givenNames, reorderedNames, reorderedValues, reorderedLabels, 1)
    return g, reorderedNames, reorderedValues, reorderedLabels, n
end
function addNode(node::DecisionTree.Node, g, givenNames, reorderedNames, reorderedValues, reorderedLabels, n)
    add_vertex!(g)
    append!(reorderedNames, [String(givenNames[node.featid])])
    reorderedValues = (reorderedValues..., node.featval)
    if node.featval isa Float64
        showVal = round(node.featval, sigdigits=3)
    else
        showVal = node.featval
    end
    append!(reorderedLabels, ["$(String(givenNames[node.featid]))\n ≥ $(showVal)"])
    g, reorderedNames, reorderedValues, reorderedLabels, nLeft = addNode(node.left, g, givenNames, reorderedNames, reorderedValues, reorderedLabels, n + 1)
    add_edge!(g, n, n + 1)
    g, reorderedNames, reorderedValues, reorderedLabels, nRight = addNode(node.right, g, givenNames, reorderedNames, reorderedValues, reorderedLabels, nLeft + 1)
    add_edge!(g, n, nLeft + 1)
    return g, reorderedNames, reorderedValues, reorderedLabels, nRight
end
function addNode(node::DecisionTree.Leaf, g, givenNames, reorderedNames, reorderedValues, reorderedLabels, n)
    add_vertex!(g)
    leafVal = sum(node.values .== 2) // length(node.values)
    append!(reorderedNames, [String(givenNames[end])])
    append!(reorderedLabels, ["$(String(givenNames[end])):\n $(leafVal.num)/$(leafVal.den)"])
    reorderedValues = (reorderedValues..., Float64(leafVal))
    return g, reorderedNames, reorderedValues, reorderedLabels, n
end

as for an example resulting graph, using the example above with some name changes:

using MLJ, DecisionTree
using Random, DataFrames, Tables
using Plots
X, y = make_blobs(300; rng=MersenneTwister(1234))
using DataFrames, Tables
df = DataFrame(theFirstThing=Tables.matrix(X)[:, 1], theSecondThing=Tables.matrix(X)[:, 2])
dtc = @load DecisionTreeClassifier pkg = DecisionTree verbosity = 0
dtc_model = dtc(min_purity_increase=0.005, min_samples_leaf=1, min_samples_split=2, max_depth=3)
dtc_mach = machine(dtc_model, df, y)
MLJ.fit!(dtc_mach)
x = fitted_params(dtc_mach)
Plots.plot(x[:tree], [x[:features]..., :y])

tmp

dsweber2 avatar Apr 08 '22 23:04 dsweber2

Hi @dsweber2, this looks quite good to me! 👍 ... and as you've realized the implementation using Plots recipes, we get the independence between MLJ and the graphics package (as discussed above).

@ablaom, wouldn't this be a good start for a MLJPlotRecipes package?

roland-KA avatar Apr 12 '22 10:04 roland-KA

I agree this looks nice. However, as it is specific to DecisionTree.jl trees, I suggest a PR either to DecisionTree.jl and/or MLJDecisionTreeInterface.jl. The MLJDecisionTreeInterface version could include original feature names.

Minor suggestion: replace the test if node.featval isa Float64 with if node.featval isa AbstractFloat.

However, as commented above, a more maintainable solution would be to implement the AbstractTrees.jl API for the DecisionTrees.jl objects and try to improve the generic tree-plotting capabilities of Plots.jl, if this is lacking. It's just a pity that this approach will not be able to include feature names, without some significant changes to DecisionTree.jl, as node objects in DecisionTree don't know the feature names. Perhaps in the MLJ wrapper we could include the encoding in a plot legend?

ablaom avatar Apr 19 '22 01:04 ablaom

I've just noticed that DecisionTree.jl is no longer maintained. So a PR to this package is no longer an option.

Anyway @ablaom, as you suggest, a more maintainable solution would be less dependent on DecisionTree.jl. As I'm not so familiar with plot recipes (I've just used them, never implemented one), I'm trying to understand, what that means:

  • @dsweber2 s recipe would then have an AbstractTree as its first argument (instead of a DecisionTree.Node), right?
  • And it should be a GraphRecipe replacing the TreePlot recipe in my example above?

As I that example above shows, making an AbstractTree from a DecisionTree can be relatively easy achieved: Only the AbstractTree-functions children and printnode have to be implemented.

An obstacle for a simple implementation is indeed, that DecisionTree doesn't know the feature names. In this tutorial, I've explained how that information could be added. But it is just sort of a work-around, not a desirable solution. So if MLJ could deliver that information directly, a simple implementation would be possible. @ablaom could you describe more precisely how that 'encoding in a plot legend' would look like?

@dsweber2 what do you think about such an adaption of your plot recipe? Is this a way to go?

roland-KA avatar Apr 19 '22 21:04 roland-KA

@dsweber2 s recipe would then have an AbstractTree as its first argument (instead of a DecisionTree.Node), right?

I think the problem here is that "AbstractTree" is not a type, only an interface. As far as I understand, plot recipes cannot be created in the usual way with trait-dispatch, only type-dispatch. But maybe there is a workaround if you don't use the macro.

. @ablaom could you describe more precisely how that 'encoding in a plot legend' would look like?

I just mean that your nodes are labelled like " Feature 1 < 0.4", say, but you add a legend to the figure that looks like

Feature 1 - number_of_bedrooms
Feature 2 - floor_area
Feature 3 - median_price_neighborhood
...

This might even be an advantage, as long feature names are not messing up the plot.

@bensadeghi Would you be happy to entertain a PR that implements the AbstractTrees.jl interface for Node and Leaf objects? This would be pretty minimal - see above comment. AbstractTrees.jl is a popular package with no dependencies. The existing print_tree functionality could be left untouched, or replaced by the AbtractTrees.jl (text-based) version, which would eliminate some code.

ablaom avatar Apr 19 '22 22:04 ablaom

@dsweber2 s recipe would then have an AbstractTree as its first argument (instead of a DecisionTree.Node), right?

I think the problem here is that "AbstractTree" is not a type, only an interface. As far as I understand, plot recipes cannot be created in the usual way with trait-dispatch, only type-dispatch. But maybe there is a workaround if you don't use the macro.

. @ablaom could you describe more precisely how that 'encoding in a plot legend' would look like?

I just mean that your nodes are labelled like " Feature 1 < 0.4", say, but you add a legend to the figure that looks like

Feature 1 - number_of_bedrooms
Feature 2 - floor_area
Feature 3 - median_price_neighborhood
...

This might even be an advantage, as long feature names are not messing up the plot.

@bensadeghi Would you be happy to entertain a PR that implements the AbstractTrees.jl interface for Node and Leaf objects? This would be pretty minimal - see above comment. AbstractTrees.jl is a popular package with no dependencies. The existing print_tree functionality could be left untouched, or replaced by the AbtractTrees.jl (text-based) version, which would eliminate some code.

@ablaom , That sounds fine. But please make sure that the PR includes appropriate unit tests and documentation in README.

bensadeghi avatar Apr 21 '22 07:04 bensadeghi

@roland-KA Is this something you might take on, at your leisure?

ablaom avatar Apr 21 '22 20:04 ablaom

Well, at least I could give it a try. But I need some help, especially when it comes to implement a plot recipe. @dsweber2 would you give your support on that topic?

So just to make clear what the objectives are, let me summarize. We want

  • a PR to DecisionTree.jl that implements
  • an AbstractTree wrapper (by implementing children and printnode)
  • a plot recipe that is able to plot a DecisionTree (in form of an AbstractTree)

Is that correct?

@ablaom and you would provide a an extension to the MLJ wrapper that delivers the feature names (perhaps in that extended 'legend' form you described above) as well as class names?

So that in the end we could call plot (more or less) in the following way:

plot(aDecisionTree_that_acts_like_an_AbstractTree, 
         MLJ.feature_names, 
         MLJ.class_names)

roland-KA avatar Apr 24 '22 17:04 roland-KA

Spending a bit more time today on the issue, I think, I missed the point a bit with my last comment.

@ablaom your idea is probably, that the PR to DecisionTree.jl only encompasses the AbstractTree wrapper?

A plot recipe for plotting decision trees could then be added to MLJ itself, relying just on the the AbstractTree interface. That would open the possibility for every other decision tree within in the MLJ universe to use the same generic plot recipe (it would only have to implement the AbstractTree interface). Right?

roland-KA avatar Apr 25 '22 19:04 roland-KA

Yes, that's exactly my suggestion. A PR here to implement the AbstractTrees.jl API and that is all (@bensadeghi's other requirements notwithstanding). Very happy to provided feedback on such a PR.

ablaom avatar Apr 25 '22 23:04 ablaom

Ok fine, that should be no problem 😊👍.

Then we have to decide, how we get the information about feature names and class labels into the game (especially when the nodes of the trees don't have that information, as is in case of DecisionTree.jl).

The idea in AbstractTree is, that the function printnode produces the text that should be displayed inside the nodes (and leaves). So this function needs access to feature names and class labels.

Currently I see the following alternatives on how this could be done:

  1. The 'pure' nodes and leaves that come from a decision tree can be wrapped in an enriched structure which has that additional knowledge. That's the variant, I described in my article in Towards Data Science. In this case printnode can deliver a ready-to-use text.
  2. The information is added on a later step, as arguments to the plot recipe (which would result in a usage of the recipe like: plot(decisiontree, feature_names, class_labels) ). Here the result of printnode will have to be combined with the additional information (in the best case) or (more probable) it has to be ignored and replaced by something else. I.e. the result of printnode would only be used for a simple default representation.

Approach 1 has to be implemented whithin the implementation of the AbstractTree traits, whereas no. 2 will be implemented in the context of the recipe (the implementation of the TreePlot recipe may serve as an example).

I.e. approach 1 lays more burden on the implementation of the AbstractTree traits when the tree structure doesn't have that information. But it would be relatively straightforward on a decision tree implementation like BetaML.jl (as far as I understand this code they have labels and names included).

Approach 2 would allow parameters on the plot recipe to fine tune the appearance of the tree. E.g. one could choose between showing just feature ids in the nodes, full text nodes or ids in the node and a legend with full text next to the tree plot (well, this would also be doable with approach 1, but it would be sort of a waste of effort).

But perhaps I'm missing a point at the moment and things are less complicated ... 🤔

roland-KA avatar Apr 26 '22 13:04 roland-KA

@roland-KA I'd suggest we that for now we avoid writing plot recipes that are specific to ML decision trees and instead work on building tree representations that implement AbstractTrees.jl that can then be thrown at generic plotting algorithms - whether Makie or Plots.jl. So my vote is for 1. I think this is simpler and easier to maintain.

I very much like your idea to build an enhanced AbstractTree object for DecisionTree.jl and had not realised how far you had already gone with this in the Towards Data Science post - that's a complete POC really. So there is a way to get the feature names into the game after all. This is only slightly more complicated than than implementing AbstractTree API for raw nodes and leaves, and I expect @bensadeghi would not object to it's addition here, as there is still no refactoring of existing code.

ablaom avatar Apr 28 '22 05:04 ablaom

The objective for using an AbstractTree is to hide all implementation details of specific decision trees from the plot recipe. I've investigated possible alternatives to the two approaches described above, but I didn't find any other ways to do it.

So I completely agree with you and would implement the concept from my TDS article.

BTW: In the meantime I've delved a bit into the documentation of plot recipes and I think I can do that too. 🤓

roland-KA avatar Apr 28 '22 10:04 roland-KA

I'm currently testing my implementation of the AbstractTree-interface for DecisionTree.jl.

Here I came across the fact, that DecisionTree stores in its Leafs in field majority sometimes the class labels and in some cases ids of class labels (i.e. an index value).

Is it correct that it stores there

  • class labels when used with its native interface (outside of MLJ)
  • ids to class labels when used within MLJ?

roland-KA avatar May 04 '22 17:05 roland-KA

Off the top of my head I don't know the answer to point one. Is there even a distinction in native DecisionTree.jl between id and class labels?

edited Currently, the MLJ interface, always converts target class labels (categorical values) to integers, for passing to DecisionTree.jl.

Features are left untouched and DecisionTree.jl is happy with any eltype which supports <, I'm pretty sure.

Does this help?

ablaom avatar May 05 '22 01:05 ablaom

Let me know if you need me to look further into this.

ablaom avatar May 05 '22 02:05 ablaom

Ah that makes sense and fits well with my observations!

Up to now I only dealt with DecisionTrees that were produced by MLJ, so I was a bit surprised to encounter trees with class labels included. But with your explanation things are clear now.

roland-KA avatar May 05 '22 13:05 roland-KA

Hi @ablaom, the PR for DecsionTree.jl is now available. The automated tests done by GitHub failed although; but I don't know why.

Could you have a look at the PR?

In src/DecsionTree.jl I've exported (among others) the functions children and printnode (coming from AbstractTrees). Any opinion about this being good or bad?

roland-KA avatar May 06 '22 14:05 roland-KA