flair icon indicating copy to clipboard operation
flair copied to clipboard

[Feature]: Add support for MultiCoNER v2 Dataset

Open stefan-it opened this issue 10 months ago • 1 comments

Problem statement

Hi,

there's a new EMNLP 2023 paper that introduces version 2 of MultiCoNER dataset.

MultiCoNER v2 should also be supported in Flair :hugs:

Solution

The dataset is hosted on the Hugging Face Model Hub:

https://huggingface.co/datasets/MultiCoNER/multiconer_v2/tree/main

Train, Development and Testfiles can also be accessed there, e.g. see files for German:

https://huggingface.co/datasets/MultiCoNER/multiconer_v2/tree/main/DE-German

Additional Context

It should be discussed, if we can extend the existing NER_MULTI_CONER implementation, and add a version tag to it:

https://github.com/flairNLP/flair/blob/ed53c42ec2e8d8abbd07acd7f6b531945ac45606/flair/datasets/sequence_labeling.py#L3048C7-L3055

class NER_MULTI_CONER(MultiFileColumnCorpus):
    def __init__(
        self,
        task: str = "multi",
        version: str = "v1",
        base_path: Optional[Union[str, Path]] = None,
        in_memory: bool = True,
        **corpusargs,
    ) -> None:

The version parameter is then set to v1 to ensure backward-compatibility :thinking:

stefan-it avatar Oct 23 '23 21:10 stefan-it

I agree @stefan-it - that would be great to add!

alanakbik avatar Oct 24 '23 07:10 alanakbik