codingma

Peking University Beijing China new bird in nlp

Results 76 comments of


                                            codingma

Unable to install kafka-connect-datagen:0.1.0

Hi, I am on CentOS Linux release 7.4.1708, Java version is openjdk version "1.8.0_181". Docker-compose file is latest. After running `docker-compose up -d --build` in `examples/cp-all-in-one` i get the following...

Unable to install kafka-connect-datagen:0.1.0

> @codemayq can you try the suggestion in [#654 (comment)](https://github.com/confluentinc/cp-docker-images/issues/654#issuecomment-499251419) to isolate the issue to Docker or not. with > confluent-hub install confluentinc/kafka-connect-datagen:latest --component-dir . it can be completed.

when will the KB snippets of Satori be released?

> It may cost a few months. > The whole Satori KB is too huge (and each entity may have many versions), extracting an appropriate subgraph is not easy. thanks，understand...

seq2seq_chatbot_new 和seq2seq_chatbot

好的，非常感谢。我由知乎的 NLP面经贴过来，祝你实习顺利~感谢写的博客与code

seq2seq_chatbot_new 和seq2seq_chatbot

感觉是1.4以上就行，然后默认是Python3。中文自己根据需求，改一下输入的数据就行。

glm2训练完测试发现重复生成的情况比较严重

需要大家提供一下训练更详细的设置，比如当时完整的训练命令和测试命令，关键是需要了解训练的方式（lora等），是否量化等

loss已经非常低了，怎么模型回答的和label答案还有那么大出入

1. 请升级最新版 2. 训练的结果是一个lora 权重，所以权重很小 3. 微调理想情况下，需要把目标的数据集和通用的数据集一起放进去学习，这样训练效果更佳

Windows无法识别数据集：datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

请先确定一下数据集的对应文件是否成功下载，datasets的版本也升级一下。

Windows无法识别数据集：datasets.exceptions.DatasetGenerationError: An error occurred while generating the dataset

请 double check 一下下载的data文件夹的内容是否完整，文件内容是否正确，读取自带数据集是一个标准操作。其他人没有遇到此问题。

预训练效率问题

不知道你是训练什么模型 1. 在不爆显存的情况下，适当提高 per_device_train_batch_size 2. 开启 flash attention 会快一点点

1
2
3
4
5
6
7
8
›