sserdoubleh
sserdoubleh
You can change the data format in `interaction.py`: https://github.com/PaddlePaddle/Knover/blob/15d5279a4370b225b0c388a129b774c9469fcde4/interaction.py#L69 For example: ```python personas = ["your persona: i have three cats."] example = Example(src=" [SEP] ".join(personas + context), data_id=0) ``` You...
Sorry for late reply. If you want to get the inference model you can use the script: https://github.com/PaddlePaddle/Knover/blob/develop/scripts/local/save_inference_model.sh . But I think there may has some bug in the usage...
We use BST dataset in our finetuning stage. You can look at : https://github.com/PaddlePaddle/Knover/blob/develop/data/example/valid.tsv This is an example of using personas.
1. You can change the interact script(`knover/scripts/interact.py`) like this issue: https://github.com/PaddlePaddle/Knover/issues/24 2. If you interact with PLATO-2 model, you can use personas directly.
不是没有实现_get_feed_dict,你找错地方了: https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/models/plato.py#L52 这个报错是你的 config 文件里面没有配置 latent_type_size https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/projects/PLATO-2/32L.json#L13 参考报错的路径: https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/models/plato.py#L172
以我的经验在涉及的任务的数据量不大的时候,数据增强的效果会比较明显的
如果模型结构比较简单,迁移会简单些。迁移主要就是:模型结构,分词和对齐数据处理 如果有这方面的需求,我们后续考虑把这方面的代码也整出来
可以根据`Knover/README.md`( https://github.com/PaddlePaddle/Knover/blob/master/README.md )的提示准备好语料,可以使用sentencepiece工具( https://github.com/google/sentencepiece )处理生成词表,格式可以参照`./package/dialog_en/voca.txt`与`./package/dialog_en/spm.model`;或者使用已有的中文词表,如果是使用其他的Tokenizer(不是sentencepiece tokenizer),可以通过修改`./utils/tokenization.py`,参考`SentencePiecieTokenizer`的实现实现对应的Tokenizer(比如叫`BasicTokneizer`),在配置中的train_args中指定Tokenizer即可(加一行`train_args="--tokenizer BasicTokenizer"`) https://github.com/PaddlePaddle/Knover/blob/15d5279a4370b225b0c388a129b774c9469fcde4/utils/tokenization.py#L124 训练的具体操作与配置也可以参照`Knover/README.md`
What is the PaddlePaddle version? I try to interact with PLATO-XL on 4 V100 GPUs with 32GB RAM. And it is normal. data:image/s3,"s3://crabby-images/f5ef7/f5ef7690049dc24702434ef653f7c671394282d2" alt="图片"
It can't support plato2's interact mode now. We will upgrade dygraph branch later.