pytextrank icon indicating copy to clipboard operation
pytextrank copied to clipboard

Error when adding textrank component for language model in Python 3.12 Docker setup

Open matteosdocsity opened this issue 1 year ago • 0 comments
trafficstars

I encountered an issue when trying to use PyTextRank with spaCy in a Docker container using Python 3.12.7. The problem arises when I try to add the textrank component to the all language models (it in the example).

Environment:

Python version: 3.12.7 (using the Docker image python:3.12.7-slim-bullseye) spaCy versions: 3.0.5 and 3.7.4 (tested both) PyTextRank version: 3.3.0 Steps to reproduce: Here are the commands I'm using to set up the environment in Docker:

RUN /root/.cargo/bin/uv pip install --no-cache --system spacy==3.0.5 pytextrank==3.3.0
RUN python -m spacy download it_core_news_sm
RUN python -m spacy download en_core_web_sm
RUN python -m spacy download es_core_news_sm
RUN python -m spacy download pt_core_news_sm
RUN python -m spacy download ru_core_news_sm
RUN python -m spacy download fr_core_news_sm
RUN python -m spacy download de_core_news_sm
RUN python -m spacy download pl_core_news_sm
RUN python -m spacy download xx_ent_wiki_sm

Error Message: The following error is thrown when attempting to add the textrank component to the Italian language model:

File "/app/extractive_summary/text_rank.py", line 139, in process_chunk_summary
 nlp.add_pipe("textrank", config={"stopwords": {"word": list(self.stop_words)}}, last=True)

 File "/usr/local/lib/python3.12/site-packages/spacy/language.py", line 824, in add_pipe
    pipe_component = self.create_pipe(
                    ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/spacy/language.py", line 693, in create_pipe
    raise ValueError(err)
ValueError: [E002] Can't find factory for 'textrank' for language Italian (it). 
This usually happens when spaCy calls `nlp.create_pipe` with a custom component name that's not registered on the current language class. 
If you're using a custom component, make sure you've added the decorator `@Language.component` (for function components) or `@Language.factory` (for class components).

Expected Behavior: The textrank component should be successfully added to the Italian language model without throwing any errors.

Additional Information: This issue seems to be related to the component registration process in spaCy.

matteosdocsity avatar Oct 10 '24 14:10 matteosdocsity