a-shell icon indicating copy to clipboard operation
a-shell copied to clipboard

goose3 stops working at the 3rd execution

Open system1system2 opened this issue 2 years ago • 4 comments

I installed goose3 (via pip3 install goose3) in a-Shell and created a Python script to process an input URL and print on screen the output of goose3 function article.cleaned_text. Accordingly, I run:

python3 news.py https://url-of-choice

The script works just fine for the first 2 executions. At the third one, I received the error below and the only way to recover is by manually killing a-Shell and restarting it:

Traceback (most recent call last):
  File "/private/var/mobile/Library/Mobile Documen
ts/iCloud~AsheKube~app~a-Shell/Documents/ReadThis/
news.py", line 2, in <module>
    from goose3 import Goose;
  File "/var/mobile/Containers/Data/Application/E9
DE2EBA-7BF5-4D20-8795-AD19AC53E4ED/Library/lib/pyt
hon3.9/site-packages/goose3/__init__.py", line 27,
 in <module>
    from goose3.configuration import ArticleContex
tPattern, Configuration, PublishDatePattern  # noq
a: F401
  File "/var/mobile/Containers/Data/Application/E9
DE2EBA-7BF5-4D20-8795-AD19AC53E4ED/Library/lib/pyt
hon3.9/site-packages/goose3/configuration.py", lin
e 27, in <module>
    from goose3.parsers import Parser, ParserSoup
  File "/var/mobile/Containers/Data/Application/E9

DE2EBA-7BF5-4D20-8795-AD19AC53E4ED/Library/lib/python3.9/site-packages/goose3/parsers.py", line 25, in <module>
    import lxml.html
  File "/private/var/containers/Bundle/Application/E33369CF-E4C8-4248-969A-79B03C138937/a-Shell.app/Library/lib/python3.9/site-packages/lxml/html/__init__.py", line 87, in <module>
    _rel_links_xpath = etree.XPath("descendant-or-self::a[@rel]|descendant-or-self::x:a[@rel]",
TypeError: 'NoneType' object is not callable

system1system2 avatar Oct 29 '21 13:10 system1system2

Hi, I'm trying to reproduce, but just calling "from goose3 import Goose" is not enough (the issue does not appear). I might need the script you are running (news.py).

holzschu avatar Oct 29 '21 13:10 holzschu

Sorry:

import sys
from goose3 import Goose;
url = sys.argv[1]

g = Goose()
article = g.extract(url=url)
print(article.cleaned_text)

system1system2 avatar Oct 29 '21 14:10 system1system2

Thanks. I found the issue (lxml was not cleaning up when leaving). It will be fixed in the next build.

holzschu avatar Oct 29 '21 16:10 holzschu

Hi, the next build is now available on TestFlight: https://testflight.apple.com/join/WUdKe3f4 It should be working now.

holzschu avatar Nov 22 '21 12:11 holzschu