scispacy icon indicating copy to clipboard operation
scispacy copied to clipboard

SciSpacy Python 3.11 Installation Fails / Broken nmslib dependency

Open ksaadDE opened this issue 1 year ago • 6 comments

python3 -m pip install scispacy

bin/python (venv) --> Python 3.11.6

bin/pip --version (venv) --> 3.11.6

Building wheel for nmslib (pyproject.toml)
include -I/usr/include/python3.11 -c nmslib.cc -o build/temp.linux-x86_64-cpython-311/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO=\"2.1.1\" -std=c++14 -fvisibility=hidden
python3.11/site-packages/pybind11/include/pybind11/attr.h:310:20: Err: »const struct pybind11::detail::function_record
error: command '/usr/bin/gcc' failed with exit code 
Failed to build nmslib
ERROR: Could not build wheels for nmslib, which is required to install pyproject.toml-based projects

ksaadDE avatar Feb 03 '24 15:02 ksaadDE

Luckily enough, nobody needs to install the entire scispacy library to just obtain the Abbreviation Extraction utility :) https://github.com/allenai/scispacy/blob/main/scispacy/abbreviation.py

Just in case someone needs it as well. To include and use it:

from filename import AbbreviationDetector
loaded_nlp_model.add_pipe('abbreviation_detector')

Example code, partially ~~stolen~~ borrowed from StackOverflow

import spacy
from filename import AbbreviationDetector

def filter_abbrv (loaded_nlp_model, txtData):
        loaded_nlp_model.add_pipe('abbreviation_detector')
        doc=loaded_nlp_model (txtData)
        altered_tok=[tok.text for tok in doc]
        print("abbrv:", doc._.abbreviations)
        for abrv in doc._.abbreviations:
            altered_tok[abrv.start]=str(abrv._.long_form)
        return (" ".join(altered_tok))

loaded_nlp_model = spacy.load("en_core_web_lg") # or whatever
filter_abbrv (loaded_nlp_model, "StackOverflow (SO) and Github are pretty cool")

adding_abbreviation_detection_to_your_spacy_nlp_project.md

ksaadDE avatar Feb 03 '24 16:02 ksaadDE

Has anyone figured out a work-around for this for the functionalities which require scispacy or even nmslib?

mp-lunartree-bio avatar May 23 '24 12:05 mp-lunartree-bio

Hi, you may have some luck with this workaround here: https://github.com/allenai/scispacy/issues/473#issuecomment-1590443024

dakinggg avatar Jun 07 '24 07:06 dakinggg

My workaround was to install everything in Python/anaconda 3.9. Annoying, but it works

ddofer avatar Jun 13 '24 09:06 ddofer

@dakinggg Do you have a workaround for Databricks ML? I've run out of tricks, I cannot get nmslib to install on 3.11 or 3.10

ulc0 avatar Jul 29 '24 22:07 ulc0

Based on https://github.com/allenai/scispacy/issues/520#issue-2438749767, I was able to get it working on both windows and wsl with python 3.11, by installing with mamba. Could others on this thread try that and let me know if it works? If so, I will update the installation instructions.

dakinggg avatar Aug 11 '24 22:08 dakinggg