Document-Transformer issues

数据里的预处理

7

您好：我是北京大学的一名学生，正在研究document_nmt这部分，想请问您，如果方便的话，是否可以提供论文中提到的数据集呢？此外，想请问下，对于数据集的预处理部分，可以开放看看么，对于中文该进行哪些预处理呢~ 期待您的回复~ 祝好~

How to set parameters when use 940k ch-en corpus to train?

1

Comparing with training of using corpus of 2M ch-en, when I use corpus of 940k ch-en to train model, what parameters should I use ？I have tried to use batch_size=25k,...

Rooders

about translate and decode

1

When I use context-level model to decoding and test, is parameter MODEL-PATH a folder that include all models or a model file that is one model? if it is former，when...

Rooders

abut train corpus format

4

hello~ When I use this code to training a model, What format should be processed for the source corpus, the target corpus, the context corpus? are they tokenized and BPE?...

Rooders

Would you please provide me the scripts of generating "context corpus"?

@Glaceon31 Thank you in advance!

robotsp

context corpus

您好，现目前我已将Transformer模型训练好，需要进一步按照readme文件中的指令训练，但是存在的问题是不知道source corpus、target corpus以及context corpus文件里面的具体格式，如果过可以的话，是否可以提供论相关文件呢，期待您的回复，祝好，我的邮箱为[email protected]

ssslcl

Document-Transformer
Document-Transformer copied to clipboard

Metadata

数据里的预处理

How to set parameters when use 940k ch-en corpus to train?

about translate and decode

abut train corpus format

Would you please provide me the scripts of generating "context corpus"?

context corpus

← Metadata

Owner

Metadata

Document-Transformer Document-Transformer copied to clipboard

Metadata

数据里的预处理

How to set parameters when use 940k ch-en corpus to train?

about translate and decode

abut train corpus format

Would you please provide me the scripts of generating "context corpus"?

context corpus

← Metadata

Owner

Metadata

Document-Transformer
Document-Transformer copied to clipboard