sserdoubleh comments

Results 74 comments of


                                            sserdoubleh

Context to a conversation in PLATO-2

You can change the data format in `interaction.py`: https://github.com/PaddlePaddle/Knover/blob/15d5279a4370b225b0c388a129b774c9469fcde4/interaction.py#L69 For example: ```python personas = ["your persona: i have three cats."] example = Example(src=" [SEP] ".join(personas + context), data_id=0) ``` You...

why there is no model in model file 24L/Plato

Sorry for late reply. If you want to get the inference model you can use the script: https://github.com/PaddlePaddle/Knover/blob/develop/scripts/local/save_inference_model.sh . But I think there may has some bug in the usage...

Defining knowledge based document.

We use BST dataset in our finetuning stage. You can look at : https://github.com/PaddlePaddle/Knover/blob/develop/data/example/valid.tsv This is an example of using personas.

Defining knowledge based document.

1. You can change the interact script(`knover/scripts/interact.py`) like this issue: https://github.com/PaddlePaddle/Knover/issues/24 2. If you interact with PLATO-2 model, you can use personas directly.

Stage2.1训练时报错

不是没有实现_get_feed_dict，你找错地方了： https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/models/plato.py#L52 这个报错是你的 config 文件里面没有配置 latent_type_size https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/projects/PLATO-2/32L.json#L13 参考报错的路径： https://github.com/PaddlePaddle/Knover/blob/ac58d760973cacb163b5dc5e1be0b7c54ca75140/knover/models/plato.py#L172

请问有没有做过数据增强

以我的经验在涉及的任务的数据量不大的时候，数据增强的效果会比较明显的

如何从hugging face的模型作迁移

如果模型结构比较简单，迁移会简单些。迁移主要就是：模型结构，分词和对齐数据处理如果有这方面的需求，我们后续考虑把这方面的代码也整出来

该plato代码怎么去训练中文模型呢

可以根据`Knover/README.md`（ https://github.com/PaddlePaddle/Knover/blob/master/README.md ）的提示准备好语料，可以使用sentencepiece工具（ https://github.com/google/sentencepiece ）处理生成词表，格式可以参照`./package/dialog_en/voca.txt`与`./package/dialog_en/spm.model`；或者使用已有的中文词表，如果是使用其他的Tokenizer（不是sentencepiece tokenizer），可以通过修改`./utils/tokenization.py`，参考`SentencePiecieTokenizer`的实现实现对应的Tokenizer（比如叫`BasicTokneizer`)，在配置中的train_args中指定Tokenizer即可（加一行`train_args="--tokenizer BasicTokenizer"`） https://github.com/PaddlePaddle/Knover/blob/15d5279a4370b225b0c388a129b774c9469fcde4/utils/tokenization.py#L124 训练的具体操作与配置也可以参照`Knover/README.md`

Plato-XL inference.sh runs, but output appears to be garbage

What is the PaddlePaddle version? I try to interact with PLATO-XL on 4 V100 GPUs with 32GB RAM. And it is normal. ![图片](https://user-images.githubusercontent.com/6134289/144851763-346fb95f-7c04-46c0-bfa9-9c3f3c34a5e1.png)

Does dygraph branch support plato2's interact mode?

It can't support plato2's interact mode now. We will upgrade dygraph branch later.