vert-papers icon indicating copy to clipboard operation
vert-papers copied to clipboard

The version of FewNERD

Open dongguanting opened this issue 3 years ago • 9 comments

Hi, @iofu728. It seems the open source dataset “episode-data” is the arxiv version of FewNERD? I found that the reproduced results are very different from those in the paper, maybe you use the ACL version of FewNERD in the paper?

dongguanting avatar Aug 26 '22 08:08 dongguanting

Hi @dongguanting, We don't have the copyright of Few-NERD dataset. Please contact the owner of this dataset. We already clear the Few-NERD version in our paper footnote 5. And we show all Few-NERD ACL and arxiv version results in our Github repo.

iofu728 avatar Aug 26 '22 09:08 iofu728

Thanks a lot for your reply. I still have a question during testing cross dataset senario. How to set up the script to execute the settings in your paper (2 datasets for training, 1 for valid, 1 for test), does this mean that it need to perform 2 rounds of training process with spans and types of 2 different ner_train.json?

dongguanting avatar Aug 28 '22 06:08 dongguanting

Hi @dongguanting, not really, in the Cross-Domain dataset, you only need to train once on the training set (Span+Type) and then evaluate it directly. In the training phase, the model can see all task data of both domains. In our scripts, you can set the dataset to Domain and use the difference N to set the domain.

N=1 # 1 or 2 or 3 or 4
K=1 # 1 or 5

...
--dataset Domain \

iofu728 avatar Aug 29 '22 09:08 iofu728

Maybe you wrongly reversed the results of the ACL version and arXiv version in this repo?(f1 of FEW-NERD arxiv version is higher,but in your repo,the ACL version result is higher) And I downloaded the arxiv version of episodes-data before (568MB, this link is already unavailable), the only version of episodes-date we can download on the FEW-NERD website (500 MB) is probably ACL version.

liyongqi2002 avatar Sep 08 '22 05:09 liyongqi2002

Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in https://github.com/thunlp/Few-NERD/commit/cb16dc48562f0017c74492a906f461a6947a4219#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD arixv v6 version result(500MB, using the URL link in https://github.com/thunlp/Few-NERD/commit/e32907982dac9956aaa603c28b57138b192fe6c0#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20), which also use in ESD.

iofu728 avatar Sep 10 '22 03:09 iofu728

Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in thunlp/Few-NERD@cb16dc4#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD [arixv v6](https://arxiv.org/pdf/2105.07464v6.pdf, using the URL link in thunlp/Few-NERD@e329079#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20) version result(500MB), which also use in ESD.

Thanks for your reply, so the results that can be compared now are the results of the second table (using the 500MB episodes data, which is also presented in https://paperswithcode.com/sota/few-shot-ner-on-few-nerd-inter), is my understanding correct?

liyongqi2002 avatar Sep 10 '22 03:09 liyongqi2002

Hi @liyongqi2002, thanks for the reminder. We have some problems with the presentation of the Few-NERD dataset version. I will fix it as soon as possible. In fact, the first table is the Few-NERD arixv v5 version result(542MB, using the URL link in thunlp/Few-NERD@cb16dc4#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R23), which also use in CONTAINER and ESD. The second table is the Few-NERD [arixv v6](https://arxiv.org/pdf/2105.07464v6.pdf, using the URL link in thunlp/Few-NERD@e329079#diff-5f38df55c9cead1d256aa12d97c6ed0244fcbda7e36124fa9ec53014750d1283R20) version result(500MB), which also use in ESD.

Thanks for your reply, so the results that can be compared now are the results of the second table (using the 500MB episodes data, which is also presented in https://paperswithcode.com/sota/few-shot-ner-on-few-nerd-inter), is my understanding correct?

Yeah, you can compare the results in the second table by using the 500MB episodes data.

iofu728 avatar Sep 10 '22 03:09 iofu728

@dongguanting I'm also trying the code but it asks me episode-data/inter/... missing. Where can I obtain this dataset? I downloaded the Few-NERD dataset but they are .txt files. Thanks

GenVr avatar Sep 21 '22 16:09 GenVr

@dongguanting I'm also trying the code but it asks me episode-data/inter/... missing. Where can I obtain this dataset? I downloaded the Few-NERD dataset but they are .txt files. Thanks

Hi @GenVr, you can download the arxiv v6 version Few-NERD dataset by follow the script in their repo in https://github.com/thunlp/Few-NERD/blob/main/data/download.sh#L20-L22.

iofu728 avatar Sep 23 '22 02:09 iofu728