multi_rake icon indicating copy to clipboard operation
multi_rake copied to clipboard

Segmentation fault from rake.apply function

Open birgitbartels opened this issue 6 months ago • 0 comments

Hello everyone,

I wanted to use the multi_rake keyword extractor. However, my code continuously shuts down because of a 'segmentation fault', which seems to be linked to the line "keywords = rake.apply(text=text)".

I create a class that uses the rake extractor and then wanted to use that class on a small Dutch text:

from multi_rake import Rake

class RakeKeywordExtractor():

    def __init__(self):
        # These are the default values, but we might want to adapt them!
        self.rake = Rake()

    def get_keywords(self, text, limit=None):
        if limit:
            keywords = self.rake.apply(text=text)
            return keywords[:limit]
        
        else:
            return self.rake.apply(text=text)
        
keyword_extractor = RakeKeywordExtractor()


tekst = """
De oorzaak van aften is niet bekend. We denken dat ze makkelijker ontstaan bij 1 of meer van deze dingen:

kleine wondjes in uw mond, bijvoorbeeld door:
bijten op uw wang
tandenpoetsen of flossen
een kunstgebit dat niet goed past
droge mond
stress
veranderingen in hormonen, bijvoorbeeld door ongesteld zijn of zwanger zijn
erfelijke aanleg: dit betekent dat veel mensen in uw familie aften hebben
heel soms bij te weinig ijzer, vitamine B12, of foliumzuur in uw bloed.
heel soms zijn aften een bijwerking van medicijnen
Bijvoorbeeld van sterke pijnstillers (fentanyl) of medicijnen bij kanker.
Er is geen bewijs dat deze dingen aften veroorzaken.
"""

keywords = keyword_extractor.get_keywords(tekst)
print("These are the keywords:")
for keyword in keywords:
    print(keyword)

I enabled fault handler to get more information about the segmentation fault and then got this :

Fatal Python error: Segmentation fault

Current thread 0x00000001ddd42080 (most recent call first):
  File "...venv/lib/python3.11/site-packages/multi_rake/utils.py", line 14 in detect_language
  File "...venv/lib/python3.11/site-packages/multi_rake/algorithm.py", line 62 in apply
  File "...backend/src/services/keyword_extraction/rake.py", line 18 in get_keywords
  File "...backend/src/services/keyword_extraction/rake.py", line 40 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pvectorc, pycld2._pycld2, regex._regex (total: 16)
[1]    12942 segmentation fault  venv/bin/python -Xfaulthandler 

The error seems to be linked to the detect_language function in multi_rake/utils.py.

Does anybody maybe know what is causing this segmentation error and how I can resolve it?

Thank you!

Kind regards,

Birgit Bartels

birgitbartels avatar Dec 29 '23 14:12 birgitbartels