taxonerd icon indicating copy to clipboard operation
taxonerd copied to clipboard

errors using gbif_backbone entity linker

Open mpoelchau opened this issue 1 year ago • 1 comments

Thanks for publishing a really useful resource! I've used the python version successfully with the NCBI entity linker, but when I use the gbif backbone on the same dataset I get the stack trace below. Any pointers? I'm using python 3.9.2

$ taxonerd ask -m en_core_eco_biobert -l gbif_backbone -i reports/ -o reports/test_ann_gbif
Your CPU supports instructions that this binary was not compiled to use: SSE3 SSE4.1 SSE4.2 AVX AVX2
For maximum performance, you can install NMSLIB from sources 
pip install --no-binary :all: nmslib
Traceback (most recent call last):
  File "/project/nal_genomics/mpoelchau/taxonerd-env/bin/taxonerd", line 8, in <module>
    sys.exit(main())
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/cli.py", line 111, in main
    cli()
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/apps/python-3.9.2/lib/python3.9/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/cli.py", line 84, in ask
    nerd.load(ner_model, exclude=exclude, linker=link_to, threshold=thresh)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/taxonerd.py", line 68, in load
    self.nlp.add_pipe(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/spacy/language.py", line 801, in add_pipe
    pipe_component = self.create_pipe(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/spacy/language.py", line 680, in create_pipe
    resolved = registry.resolve(cfg, validate=validate)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/confection/__init__.py", line 728, in resolve
    resolved, _ = cls._make(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/confection/__init__.py", line 777, in _make
    filled, _, resolved = cls._fill(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/confection/__init__.py", line 849, in _fill
    getter_result = getter(*args, **kwargs)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking.py", line 83, in __init__
    self.candidate_generator = candidate_generator or CandidateGenerator(
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/candidate_generation.py", line 259, in __init__
    self.kb = kb or KnowledgeBaseFactory().get_kb(name)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 158, in get_kb
    return GbifKnowledgeBase()
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 178, in __init__
    super().__init__(file_path, prefix)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 86, in __init__
    self.conn = self.json_to_sqlite(file_path, db_path)
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 99, in json_to_sqlite
    for concept in raw:
  File "/project/nal_genomics/mpoelchau/taxonerd-env/lib/python3.9/site-packages/taxonerd/linking/linking_utils.py", line 92, in <genexpr>
    raw = (json.loads(line) for line in open(cached_path(file_path)))
  File "/apps/python-3.9.2/lib/python3.9/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/apps/python-3.9.2/lib/python3.9/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/apps/python-3.9.2/lib/python3.9/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Invalid control character at: line 1 column 145 (char 144)

mpoelchau avatar Jun 28 '23 15:06 mpoelchau