zengxianfeng

Results 18 comments of zengxianfeng

I think the simplest method for this is to split your data into several shards and translate them with multiple works in a parallel way.

> How do you conduct the parallel excution? > > I am not so sure but it seems that qsub results a slower excution, while qrsh makes fsdp and no_c10d...

> No, it is not this script. It is one step before this excution. > > Since there is a `nnodes` there, are you using some cloud service for computation?...

> Instead of having one folder as a `$data_dir` like `fairseq-train /data-bin/`, have more more folders like `fairseq-train chat1:chat2:chat3:....:chatN` and make sure each chat_i folder has a train.bin,idx. Then fairseq...

> Are you using like `--num-shards --shard-id`? They are different from `:` . If you use `:`, previous memory should be released. > > Anyway, in this case, 170GB per...

> We don't implement enc-dec attention. But I only use it for encoder which has no enc-dec attention. And I didn't take the process for CLS which does not occur...

好像是左右熵的最小值和互信息相加再乘以词的概率

大佬们有什么改进吗 我这边也是速度太慢了

> Hi @SefaZeng > This issue also happens with my code: in-valid transitions (e.g. O I-PER) are produced by the BiLSTM-CRF model. > > The issue is sadly not trivial...