zengxianfeng comments

Results 18 comments of


zengxianfeng

I really want to know how to increase the speed of fairseq-generate

I think the simplest method for this is to split your data into several shards and translate them with multiple works in a parallel way.

Why FSDP is much slower than normal data Parallel?

> How do you conduct the parallel excution? > > I am not so sure but it seems that qsub results a slower excution, while qrsh makes fsdp and no_c10d...

Why FSDP is much slower than normal data Parallel?

> No, it is not this script. It is one step before this excution. > > Since there is a `nnodes` there, are you using some cloud service for computation?...

OOM to train translation models with very large multilingual dataset.

> Instead of having one folder as a `$data_dir` like `fairseq-train /data-bin/`, have more more folders like `fairseq-train chat1:chat2:chat3:....:chatN` and make sure each chat_i folder has a train.bin,idx. Then fairseq...

OOM to train translation models with very large multilingual dataset.

> Are you using like `--num-shards --shard-id`? They are different from `:` . If you use `:`, previous memory should be released. > > Anyway, in this case, 170GB per...

Do these codes also fit for encoder-decoder transformer?

> We don't implement enc-dec attention. But I only use it for encoder which has no enc-dec attention. And I didn't take the process for CLS which does not occur...

这一步的意义是什么，为什么这样计算

好像是左右熵的最小值和互信息相加再乘以词的概率

想了解一下算法的复杂度是多少？是否还有提升的空间

大佬们有什么改进吗我这边也是速度太慢了

Meet a undefined reference to 'impZN5MeCab12createTaggerEPKc' when running the example.cpp

I use the clion IDE on windows10

Wrong transition in crf when doing a sequence labeling task

> Hi @SefaZeng > This issue also happens with my code: in-valid transitions (e.g. O I-PER) are produced by the BiLSTM-CRF model. > > The issue is sadly not trivial...