plm-nlp-code issues

书中4.6.1节Vocab类的convert_tokens_to_ids方法有误

1

如题

7.4.4.2节代码无法运行

文件`finetune_bert_mrc.py`加载数据时，会报如下错误 `ConnectionError: Couldn't reach https://raw.githubusercontent.com/huggingface/datasets/1.10.2/datasets/squad/squad.py` 原因是国内无法连接

namezhenzhang

4.2.2 一行代码 outputs_pool2 = pool1(outputs2) , pool1 改为pool2 也许git clone 的代码是对的 , 只是印刷错误我没有核实 4.5.1 公式没有完全体现伯努利 “更本质地讲，交叉熵损失函数公式右侧是对多类输出结果的分布（伯努利分布）求极大似然中的对数似然函数（Log-Likelihood）。” ![image](https://github.com/HIT-SCIR/plm-nlp-code/assets/140282954/dc0b52fd-753d-4ac6-98fe-f238899b1d78) 在y_(i)j = 0 的时候应该是 - （1- y_(i)j ） log （1...

ji90po

第三章 3.4.3.2 删除t2s.json文件

调用函数一直报错查看了opencc github后发现 https://github.com/BYVoid/OpenCC 首页的众多示例代码在没有xxx.json 文件的情况下直接可以跑通，文件夹内有json反而出错请务必删除文件夹内的json配置文件

ji90po

第三章 3.4.3.1 wikiextractor 问题

安装问题比较多（https://dumps.wikimedia.org/zhwiki/latest/ 语料库） 1) 如果遇到err 就像下面 ’”aise source.error('global flags not at the start ' re.error: global flags not at the start of the expression at position 4 “ 请务必将python 退到py3.10...

ji90po

第三章 sent_split函数问题

from ltp import StnSplit from ltp import LTP ltp = LTP() sents2 = StnSplit().batch_split(["南京市长江大桥。", "汤姆生病了。他去了医院。"]) sents2 ['南京市长江大桥。', '汤姆生病了。', '他去了医院。'] segment = ltp.pipeline(sents2,tasks=['cws'], return_dict=False) segment ([['南京市', '长江', '大桥', '。'], ['汤姆', '生病',...

ji90po

第二章分词 load_dict 函数分词失败的解释

load_dict 函数不是 fopen(XXX,'rb') 这样只能分的一个个汉字应该是 fopen(XXX, 'r' , encoding='UTF-8') ![image](https://github.com/HIT-SCIR/plm-nlp-code/assets/140282954/2d52a174-8401-421e-b58f-182b527f7ed8)

ji90po

关于第七章的from datasets import load_dataset, load_metric问题

1

第七章中四个bert代码都有from datasets import load_dataset, load_metric这句导包，想请问老师，是编者自己写的datasets，还是直接自己pip install datasets？

Really-Nice

3.2.1 使用ltp分词示例错误

1

``` from lip import LTP ltp = LTP() # segment, hidden = ltp.seg(['南京市长江大桥。']) 报错 # 修改为 segment = ltp.pipeline(['南京市长江大桥。'], tasks=['cws'], return_dict=False) print(segment) ```

todochenxi

ffnnlm.py注释疑似有误

根据书上的章节内容，第五章的ffnnlm.py的第一行注释应该改为# Defined in Section 5.1.3.2

Learning-WangXunyi

plm-nlp-code
plm-nlp-code copied to clipboard

Metadata

书中4.6.1节Vocab类的convert_tokens_to_ids方法有误

7.4.4.2节代码无法运行

我收集的勘误 updating

第三章 3.4.3.2 删除t2s.json文件

第三章 3.4.3.1 wikiextractor 问题

第三章 sent_split函数问题

第二章分词 load_dict 函数分词失败的解释

关于第七章的from datasets import load_dataset, load_metric问题

3.2.1 使用ltp分词示例错误

ffnnlm.py注释疑似有误

← Metadata

Owner

Metadata

plm-nlp-code plm-nlp-code copied to clipboard

Metadata

← Metadata

Owner

Metadata

plm-nlp-code
plm-nlp-code copied to clipboard