Guolin Ke
Guolin Ke
@shiyu1994 can the new CUDA version run this example successfully?
It seems the failed tests are related to shap calculation. @proto-n can you help to fix the tests?
I see, thank you. In classification, the weight is actually hessian, which depends on its gradient, and is not a constant. Actually, what you need is the data sample weight,...
It is not trivial to store data weights in tree structure, as it will affect the training efficiency. Although hessian is not weight, but it is very close. So I...
I am not familiar with SHAP, so cannot make the decision. Kindly ping @slundberg for help in data weights.
okay, you can try to match it by set ```max_depth=10``` in LightGBM .
please check the last paragraph at https://github.com/guolinke/TUPE#fine-tuning
Hi, @Redaimao 1. we use the first self-attention layer to calculate, as the later layers have residuals. 2. then, as there is `Dropout(LayerNorm(x))` for `word_emb+pos_emb` before transformer. Since `LayerNorm(a +...
@Howal It is the raw text format. the wiki data is the output of wikiextractor. We don't specially handle the `newline` token, just keep it as it is. The first...
@sowhatyc we crawl the book corpus by our own. there are two formats: txt and epub. and we save them separately.