flair icon indicating copy to clipboard operation
flair copied to clipboard

[Bug]: HUNFLAIR2_TUTORIAL_4_CUSTOMIZE_LINKING

Open pierrelarmande opened this issue 2 months ago • 0 comments

Describe the bug

Hello,

When I attempt to reproduce the code from Tutorial 4, I encounter the following error message. Then linker.predict() returns no results.

EntityMentionLinker predicts: Dictionary None (entity type: disease)

To Reproduce

import json
import flair
from flair.data import Sentence
from flair.models import EntityMentionLinker
from flair.datasets.entity_linking import (
    InMemoryEntityLinkingDictionary,
    EntityCandidate,
)
from collections import defaultdict
with open("hp.json") as fp:
    data = json.load(fp)

nodes = [n for n in data['graphs'][0]['nodes'] if n.get('type') == 'CLASS']
hpo = defaultdict(list)
for node in nodes:
    concept_id = node['id'].replace('http://purl.obolibrary.org/obo/', '')
    names = [node['lbl']] + [s['val'] for s in node.get('synonym', [])]
    for name in names:
        hpo[name].append(concept_id) 
        
from flair.datasets.entity_linking import (
    InMemoryEntityLinkingDictionary,
    EntityCandidate,
)

database_name="HPO"

candidates = [
    EntityCandidate(
        concept_id=ids[0],
        concept_name=name,
        additional_ids=ids[1:],
        database_name=database_name,
    )
    for name, ids in hpo.items()
]

dictionary =  InMemoryEntityLinkingDictionary(
    candidates=candidates, dataset_name=database_name
)

pretrained_model="cambridgeltl/SapBERT-from-PubMedBERT-fulltext"
linker = EntityMentionLinker.build(
                pretrained_model,
                dictionary=dictionary,
                hybrid_search=False, 
                entity_type="disease",
            )


sentence = Sentence(
    "The mutation in the ABCD1 gene causes X-linked adrenoleukodystrophy, "
    "a neurodegenerative disease, which is exacerbated by exposure to high "
    "levels of mercury in mouse populations."
)
linker.predict(sentence)
print(sentence)
for entity in sentence.get_spans('disease'):
    print(entity)
    for link in entity.get_labels("el"):
        print(link)

Expected behavior

X-linked adrenoleukodystrophy neurodegenerative disease

Logs and Stack traces

Embedding `HPO`: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 155/155 [00:59<00:00,  2.60it/s]
2025-10-03 08:30:28,000 EntityMentionLinker predicts: Dictionary `None` (entity type: disease)
Sentence[28]: "The mutation in the ABCD1 gene causes X-linked adrenoleukodystrophy, a neurodegenerative disease, which is exacerbated by exposure to high levels of mercury in mouse populations."

Screenshots

No response

Additional Context

No response

Environment

accelerate==1.10.1 attrs==25.3.0 beautifulsoup4==4.14.2 bioc==2.1 blis==0.7.11 boto3==1.40.41 botocore==1.40.41 catalogue==2.0.10 certifi==2025.8.3 charset-normalizer==3.4.3 click==8.3.0 confection==0.1.5 conllu==4.5.3 contourpy==1.3.2 cycler==0.12.1 cymem==2.0.11 deprecated==1.2.18 docopt==0.6.2 en-core-sci-sm @ https://s3-us-west-2.amazonaws.com/ai2-s2-scispacy/releases/v0.5.1/en_core_sci_sm-0.5.1.tar.gz filelock==3.19.1 fixed-install-nmslib==2.1.2 flair==0.15.1 fonttools==4.60.1 fsspec==2025.9.0 ftfy==6.3.1 gdown==5.2.0 hf-xet==1.1.10 huggingface-hub==0.35.3 idna==3.10 intervaltree==3.1.0 jinja2==3.1.6 jmespath==1.0.1 joblib==1.5.2 jsonlines==4.0.0 kiwisolver==1.4.9 langcodes==3.5.0 langdetect==1.0.9 language-data==1.3.0 lxml==6.0.2 marisa-trie==1.3.1 markupsafe==3.0.3 matplotlib==3.10.6 more-itertools==10.8.0 mpld3==0.5.11 mpmath==1.3.0 murmurhash==1.0.13 networkx==3.4.2 numpy==2.2.6 packaging==25.0 pathlib-abc==0.1.1 pathy==0.11.0 pillow==11.3.0 pptree==3.1 preshed==3.0.10 protobuf==6.32.1 psutil==7.1.0 pyab3p==0.1.1 pybind11==3.0.1 pydantic==1.10.24 pyparsing==3.2.5 pysocks==1.7.1 python-dateutil==2.9.0.post0 pytorch-revgrad==0.2.0 pyyaml==6.0.3 regex==2025.9.18 requests==2.32.5 s3transfer==0.14.0 safetensors==0.6.2 scikit-learn==1.7.2 scipy==1.15.3 scispacy==0.5.1 segtok==1.5.11 sentencepiece==0.2.1 setuptools==80.9.0 six==1.17.0 smart-open==6.4.0 sortedcontainers==2.4.0 soupsieve==2.8 spacy==3.4.4 spacy-legacy==3.0.12 spacy-loggers==1.0.5 sqlitedict==2.1.0 srsly==2.5.1 sympy==1.14.0 tabulate==0.9.0 thinc==8.1.12 threadpoolctl==3.6.0 tokenizers==0.22.1 torch==2.8.0 tqdm==4.67.1 transformer-smaller-training-vocab==0.4.2 transformers==4.56.2 typer==0.7.0 typing-extensions==4.15.0 urllib3==2.5.0 wasabi==0.10.1 wcwidth==0.2.14 wikipedia-api==0.8.1 wrapt==1.17.3

pierrelarmande avatar Oct 03 '25 00:10 pierrelarmande