sserdoubleh

Results 74 comments of sserdoubleh

You can change the data format in `interaction.py`: https://github.com/PaddlePaddle/Knover/blob/15d5279a4370b225b0c388a129b774c9469fcde4/interaction.py#L69 For example: ```python personas = ["your persona: i have three cats."] example = Example(src=" [SEP] ".join(personas + context), data_id=0) ``` You...

Sorry for late reply. If you want to get the inference model you can use the script: https://github.com/PaddlePaddle/Knover/blob/develop/scripts/local/save_inference_model.sh . But I think there may has some bug in the usage...

We use BST dataset in our finetuning stage. You can look at : https://github.com/PaddlePaddle/Knover/blob/develop/data/example/valid.tsv This is an example of using personas.

1. You can change the interact script(`knover/scripts/interact.py`) like this issue: https://github.com/PaddlePaddle/Knover/issues/24 2. If you interact with PLATO-2 model, you can use personas directly.

不是没有实现_get_feed_dict,你找错地方了: https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/models/plato.py#L52 这个报错是你的 config 文件里面没有配置 latent_type_size https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/projects/PLATO-2/32L.json#L13 参考报错的路径: https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/models/plato.py#L172

以我的经验在涉及的任务的数据量不大的时候,数据增强的效果会比较明显的

如果模型结构比较简单,迁移会简单些。迁移主要就是:模型结构,分词和对齐数据处理 如果有这方面的需求,我们后续考虑把这方面的代码也整出来

可以根据`Knover/README.md`( https://github.com/PaddlePaddle/Knover/blob/master/README.md )的提示准备好语料,可以使用sentencepiece工具( https://github.com/google/sentencepiece )处理生成词表,格式可以参照`./package/dialog_en/voca.txt`与`./package/dialog_en/spm.model`;或者使用已有的中文词表,如果是使用其他的Tokenizer(不是sentencepiece tokenizer),可以通过修改`./utils/tokenization.py`,参考`SentencePiecieTokenizer`的实现实现对应的Tokenizer(比如叫`BasicTokneizer`),在配置中的train_args中指定Tokenizer即可(加一行`train_args="--tokenizer BasicTokenizer"`) https://github.com/PaddlePaddle/Knover/blob/15d5279a4370b225b0c388a129b774c9469fcde4/utils/tokenization.py#L124 训练的具体操作与配置也可以参照`Knover/README.md`

What is the PaddlePaddle version? I try to interact with PLATO-XL on 4 V100 GPUs with 32GB RAM. And it is normal. ![图片](https://user-images.githubusercontent.com/6134289/144851763-346fb95f-7c04-46c0-bfa9-9c3f3c34a5e1.png)

It can't support plato2's interact mode now. We will upgrade dygraph branch later.