Xingchen Song(宋星辰) comments

Results 173 comments of


                                            Xingchen Song(宋星辰)

Ocd

@Chung-I i notice that u used cross entropy in ocd_loss rather than KLdivergence( which is official in paper 'Optimal Completion Distillation for sequence learning') , is this PR a right...

ocd_loss should be like this ? `optimal_probs = F.softmax(q_val / temp, dim=-1) ` `loss += ( optimal_probs * (torch.log(optimal_probs)- F.log_softmax(out_probs[b,:len_sample,:])) ).sum(dim=-1).mean() `

Ocd

> Yes, as the paper indicated, the loss they used is KL divergence; however, when performing backprop in this scenario, the two losses are actually equivalent in terms of gradient...

分数如何转换？

we convert fractions with two-stage method: - stage-1 : tag and construct fraction structure - "三分之二" ==> “fraction { denominator: "3" frac: "/" numerator: "2" }” - stage-2 : reorder...

Not to convert idiom

Hi，u can add "3721 三七二十一 " to `chinese_text_normalization/thrax/src/cn/hotfix.list` and re-compile this project. This is a dirty work-around.

Plugin is disabled by default

met same issue

TypeError:can't convert np.ndarray of type numpy.bool_

`is_masked = torch.ByteTensor(feature.pop("is_masked").copy().astype(np.uint8))` @menggehe

Reimplement training time and the performance on each task?

> the q/k/v wrong!! can u point out where the wrong code is? I compared this IMPL with ZihangDai'IMPL and didn't find anything wrong.

Reimplement training time and the performance on each task?

@graykode Hi~ GREAT THX for ur Pytorch IMPL of XLNet, I wonder whether u have a plan to impl fine-tuning part in pytorch?

Update of 2021 InterSpeech and ICASSP

Hi, currently I'm working on other projects, will keep on tracing when I have time.