奥特曼 issues

Results 29 issues of


奥特曼

Active learning NER

For example, I want to take a space_ Llm, perform NER auxiliary annotation. With my corrections and updates, how can I let the model know more information and actively learn

feat/model

There is a significant difference between feed shot and zero shot

If I use fewshot. It was found that the extracted content was much worse than the results extracted by zero shot。 [code.zip](https://github.com/explosion/spacy-llm/files/13376626/code.zip) The attachment is my code。 ![image](https://github.com/explosion/spacy-llm/assets/29837553/cc45f970-ee2e-463e-8d13-1e71ebddfe47) ![image](https://github.com/explosion/spacy-llm/assets/29837553/08321bf1-b17c-49ba-bc40-0b50cfca1bd4)

feat/task

Support for China's large model API?

https://yiyan.baidu.com/ If you modify the runtime based on the existing model API

feat/model

feat/request

Create Named Entity Recognition Skill

This is very important. Can we do this first?

大模型阅读理解微调

老哥，关于大模型阅读理解的微调有案例？，比如，给一大段医疗文本，如何判断里面是否有抽烟，是否有现病史，体格是否健康等呢

超过512就不支持了

能在代码里面修改，超过512分段计算，给出最后准确的start和end？

输出可以新增概率？

便捷性反馈

我有几点想反馈一下。 1.为了后期各种文本的训练，也方便人员标注。能不能以doccano或者labelstudio标准格式的数据集训练 2.能不能支持像UIE或者rexuni这种的可以自定义各种schema，然后去抽取格式。（最好兼容上面标注格式） 3.自己分段，超过512，也截断输出。把最后结果合并输出。 4.不管是实体和关系抽取，都想输出start和end，还有概率

enhancement

NER嵌套实体识别bug

1.当我使用您的sequence_labeling下方的NER脚本时候，发现你的数据集全是BIO格式。这种格式遇到嵌套实体的NER会存在漏标的情况。 2.最好的方法，还是按照doccano标注工具导出的格式，去训练。 3.希望可以支持嵌套实体标注的数据集训练 ![image](https://github.com/Tongjilibo/bert4torch/assets/29837553/6fa032d2-c946-42bc-83b4-095d21fc30f3) ``` [CMeEE-V2_dev.json](https://github.com/Tongjilibo/bert4torch/files/15478305/CMeEE-V2_dev.json)