JDA icon indicating copy to clipboard operation
JDA copied to clipboard

NaN leaf score

Open JordanCheney opened this issue 9 years ago • 2 comments
trafficstars

Hello,

First of all thank you for open sourcing this code. It is excellent. I'm encountering an error during training where a leaf node can receive a NaN score. After this happens, training freezes. The error has to occur in the following block of code-

if (node_idx >= nodes_n / 2) {
    // we are on a leaf node
    const int idx = node_idx - nodes_n / 2;
    double pos_w, neg_w;
    pos_w = neg_w = c.esp;
    for (int i = 0; i < pos_n; i++)
        pos_w += pos.weights[pos_idx[i]];
    for (int i = 0; i < neg_n; i++)
        neg_w += neg.weights[neg_idx[i]];

    float score = 0.5 * log(pos_w / neg_w);
    scores[idx] = isnan(score) ? 0. : score;

    return;
}

I added the NaN check above the return myself to work around the issue, but I'm not sure setting the score to 0 is the proper solution. Do you have any insight on better ways to avoid this problem?

JordanCheney avatar Feb 11 '16 19:02 JordanCheney

@JordanCheney pay attention to the leaf score at first carts, it shouldn't be too large. The problem is caused by the internal node split which may lead to a leaf node has no face sample or non-face sample. The score threshold may be unusually and cause weights to explosion when calculate exp()

luoyetx avatar Feb 13 '16 01:02 luoyetx

I understand that the math says "pure" splits (all face or all non-face) will basically cause the scores to explode. This seems odd to me though as the goal of a tree should be to split the data perfectly no? Of course this should be very difficult to accomplish but still. I suppose this isn't really a bug but a strange artifact of my data. For the record, I was able to get a full cascade to train using my fix above, but the scores weren't comparable to the paper and I'm hoping this is the reason.

JordanCheney avatar Feb 15 '16 21:02 JordanCheney