python-boilerpipe icon indicating copy to clipboard operation
python-boilerpipe copied to clipboard

A fatal error has been detected by the Java Runtime Environment: SIGSEGV (0xb)

Open rshiva opened this issue 11 years ago • 9 comments

Hey, Python-biolerpipe work perfectly from the console and as a script but when i trying it out with my flask application it breaks .This break when i try to instantiated Extractor and pass the url . This is what i get

http://pastebin.com/Rhzfh3hE

Initially i thought this problem is coming from jpype i raised a ticket there too . Didint help much https://github.com/originell/jpype/issues/22

Environment details

Python - 2.7.3
java version "1.7.0_45"

Flask==0.10.1
JPype1==0.5.4.5
boilerpipe==1.2.0.0

I did saw similar issue been raised but that didnt help much :-/ . Any help will be appreciated.Thanks

rshiva avatar Jan 10 '14 06:01 rshiva

So the snippet posted in originell/jpype#22 does not help? If this is a boilerpipe issue I can close the issue in our jpype fork.. ;>

originell avatar Jan 28 '14 22:01 originell

I dont know as @tcalmant mentioned i can start JVM and attach thread . i think the problem is with boilerpipe i have also posted in stackoverflow . It can give you more idea about the problem http://stackoverflow.com/questions/21310011/jvm-crashes-while-implementing-python-boilerpipe-in-flask-app

rshiva avatar Jan 29 '14 08:01 rshiva

According to the trace posted on pastebin, this is a class loading problem. I suppose this comes from line 56-57 in boilerpipe/extract/__init__.py, where the jPype is used to load a specified extractor.

Could you add some traces around these lines ? (use a logger and/or don't forget to flush the sys.stdout/stderr). Also, is the buggy code public ? Or do you have a snippet having a similar behaviour ? I'll check the problem this evening (Europe Timezone)

tcalmant avatar Jan 29 '14 08:01 tcalmant

OK, I've reproduced the bug : the thread that calls the JVM is not attached to it, therefore the calls to JVM internals fail. The bug comes from boilerpipe (see below).

First, monkey patching : in the code you posted on stackoverflow, you just have to add the following code before the creation of the extractor :

class ExtractingContent:
    @classmethod
    def processingContent(self,sourceUrl,extractorType="DefaultExtractor"):
        print "State=", jpype.isThreadAttachedToJVM()
        if not jpype.isThreadAttachedToJVM():
            print "Needs to attach..."
            jpype.attachThreadToJVM()
            print "Check Attached=", jpype.isThreadAttachedToJVM()
        extractor = Extractor(extractor=extractorType, url=sourceUrl)

About boilerpipe: the check if threading.activeCount() > 1 in boilerpipe/extractor/__init__.py, line 50, is wrong. The calling thread must always be attached to the JVM, even if there is only one.

tcalmant avatar Jan 29 '14 22:01 tcalmant

@tcalmant Thanks for the patch its working fine :) @originell I think you can close the issue in jpype since its from boilerpipe

rshiva avatar Jan 30 '14 10:01 rshiva

alright! Thanks for the clarification!

originell avatar Jan 30 '14 17:01 originell

@tcalmant Hey im running the same example in production with nginx and uwsgi but its breaking

 extractor = Extractor(extractor=extractorType, url=sourceUrl)

right in this line . Log doesnt show any error .It just gets stuck here .But its working independently as script in the python console .Any idea ..
(java version "1.7.0_51")

rshiva avatar Feb 24 '14 09:02 rshiva

To give more details, When we are trying via Python-rq (extracting article in the background) thats when it fails silently... I can see rq-worker running but nothing really happens...

jimishjoban avatar Feb 28 '14 12:02 jimishjoban

Hi, I'm not a Python-rq, nor a nginx/uwsgi expert :( Could you provide a test case ?

tcalmant avatar Feb 28 '14 13:02 tcalmant