EasyNLP issues

Results 46 EasyNLP issues

Sort by recently updated

Support Cognitive Tree

Hi, community, when will Cognitive Tree be supported? From EMNLP 2023: https://aclanthology.org/2023.findings-emnlp.828/

liuxiaocs7

ckbert使用自己的领域语料继续预训练，发现语料一大（12GB），训练时间一久，机器就会自动重启，小语料（2G）的情况下没有出现问题。遂训练时观察内存使用情况发现内存占用随着训练进度推进而逐渐增大，最终占完所有内存。是否有大神面临同样的问题？十分感激能有人回复！以下是我的训练参数： export CUDA_VISIBLE_DEVICES=0,1 gpu_number=2 negative_e_number=4 negative_e_length=16 python -m torch.distributed.launch --nproc_per_node=$gpu_number \ --master_port=52349 \ $base_dir/main.py \ --mode=train \ --worker_gpu=$gpu_number \ --tables=$local_train_file, \ --learning_rate=1e-3 \ --epoch_num=1 \ --logging_steps=100 \...

rainfallLLF

在编译tokenizer的时候失败

在mac和linux环境中都出现了：error[E0432]: unresolved import `serde::export`。尝试过切换rust版本，也没用。

ycwdaaaa

SpanProto evaluation from checkpoint doesn't reproduce the f1 score

Hi Team SpanProto, As described in the README file of span-proto directory, I trained with few-nerd 5way 5shot inter dataset. The prediction result shows ~0.82 f1 score as presented in...

sayef

Spanproto recall为1.0的情况

您好！在运行您提供的脚本后，发现recall一直为1.0的情况，不知是否是下面这行代码的原因（这里将真实标签也拼进去，后面评测的时候泄露了）https://github.com/wjn1996/SpanProto/blob/f1e0acb8672f0bfcbb7c827c48b06b3e8ccb295a/models/span_proto.py#L588 此外，在将此处改成`query_all_spans = query_predict_spans`后，得到的结果和论文中相差较大，不知哪边出了问题。 FEW-NERD 5way-1shot: inter—— span_f1:0.5826 class_f1:0.4618 intra——span_f1:0.4606 class_f1:0.3548

JayShJi

Request for data preprocessing code in the AGREE project

Dear collaborators, I can't find data preprocessing code in the [AGREE](https://github.com/alibaba/EasyNLP/tree/master/examples/agree) project. I want to know how to preprocess raw data. Could you release the code for data preprocessing?

Gzy1112

Models and weights for ConaCLIP

Hi any way you could release the ConaCLIP models and weights soon? Referring to this paper [ConaCLIP](https://aclanthology.org/2023.acl-industry.8.pdf)

justlike-prog

Citation in the MTA paper

Dear colleagues, the PromptCBLUE's official paper is out at https://arxiv.org/pdf/2310.14151.pdf, would you mind update the citation in your excellent MTA paper?

michael-wzhu

相同业务场景和文本下新增抽取实体和关系后召回率差异极大

环境为: python3.7 paddlepaddle-gpu 2.4.2.post117 paddlenlp 2.5.2 cuda-version 11.7 GPU：A800 问题描述：在合同文本中抽取关系，合同文本长度有几千到一万字不等，由于有的关系涉及到的实体间隔比较远（该情况概率较小），在标注的时候是将整篇合同文本放进去进行标注的。在第一版的时候通过实体关系标注抽取的时候召回率和f1都有0.85左右，在第二版新增了实体种类和关系种类的时候，召回率为0.3，f1为0.4，较第一版差距比较大。同时，两版抽取schema中相同的实体和关系，第二版的效果也要远低于第一版的，不知道是不是被第二版标注的其他数据给影响到了整个的效果。 eg：全部合同文本涉及到多个类别，共有700份文件进行了标注，其中各类别分布不均匀，有的类别有100多个样本，有的类别是由10多个样本，但是部分样本在100的类别抽取效果也很不好。为了验证我尝试过将该类别样本单独提取出来做训练和推理，发现仅有该类别的情况下仍然效果不好。 ![微信图片_20230922115050](https://github.com/alibaba/EasyNLP/assets/145737041/f4c39be6-7f98-4849-865d-81a8461ab402) eg：注释的schema是效果好一点的那一版，没注释的是效果比较差的 ![微信图片_20230922115038](https://github.com/alibaba/EasyNLP/assets/145737041/37735257-9540-46fc-b1a4-d4bf6a81bb0b)

1happyWDC

EasyNLP
EasyNLP copied to clipboard

Metadata

BeautifulPrompt是否支持中文

Support Cognitive Tree

CKBert继续预训练内存溢出

在编译tokenizer的时候失败

SpanProto evaluation from checkpoint doesn't reproduce the f1 score

Spanproto recall为1.0的情况

Request for data preprocessing code in the AGREE project

Models and weights for ConaCLIP

Citation in the MTA paper

相同业务场景和文本下新增抽取实体和关系后召回率差异极大

← Metadata

Owner

Metadata

EasyNLP EasyNLP copied to clipboard

Metadata

← Metadata

Owner

Metadata

EasyNLP
EasyNLP copied to clipboard