Guolin Ke comments

Results 163 comments of


                                            Guolin Ke

segfault during predict

@shiyu1994 can the new CUDA version run this example successfully?

Fixing treeshap calculation when data is weighted (fixes #5095)

It seems the failed tests are related to shap calculation. @proto-n can you help to fix the tests?

Fixing treeshap calculation when data is weighted (fixes #5095)

I see, thank you. In classification, the weight is actually hessian, which depends on its gradient, and is not a constant. Actually, what you need is the data sample weight,...

Fixing treeshap calculation when data is weighted (fixes #5095)

It is not trivial to store data weights in tree structure, as it will affect the training efficiency. Although hessian is not weight, but it is very close. So I...

Fixing treeshap calculation when data is weighted (fixes #5095)

I am not familiar with SHAP, so cannot make the decision. Kindly ping @slundberg for help in data weights.

lightgbm: better matching hyperparams

okay, you can try to match it by set ```max_depth=10``` in LightGBM .

question of providing bert model integrated with TUPE

please check the last paragraph at https://github.com/guolinke/TUPE#fine-tuning

How to calculate correlation in Figure 2?

Hi, @Redaimao 1. we use the first self-attention layer to calculate, as the later layers have residuals. 2. then, as there is `Dropout(LayerNorm(x))` for `word_emb+pos_emb` before transformer. Since `LayerNorm(a +...

What's the format of the raw data?

@Howal It is the raw text format. the wiki data is the output of wikiextractor. We don't specially handle the `newline` token, just keep it as it is. The first...

What's the format of the raw data?

@sowhatyc we crawl the book corpus by our own. there are two formats: txt and epub. and we save them separately.