xA-Scraper
xA-Scraper copied to clipboard
Error Scraping Patreon
I am trying to get this to scrape patreon, but every time it runs the scheduled scrape, I get this error:
Main.Runtime - INFO - Scheduler executing class: <class 'xascraper.modules.patreon.patreonScrape.GetPatreon'>
ScraperBase Init
Starting up
Main.WebRequest - INFO - Using global chromium tab pool
Main.WebRequest - INFO - User agent overridden!
Starting up?
Main.WebRequest - INFO - Using global chromium tab pool
apscheduler.executors.default - ERROR - Job "pat (trigger: interval[0:05:00], next run at: 2022-01-22 01:30:00 CST)" raised an exception
Traceback (most recent call last):
File "/home/bob/xA-Scraper/venv/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "./main_scrape.py", line 37, in runScraper
instance = scraper_class()
File "/home/bob/xA-Scraper/xascraper/modules/patreon/patreonScrape.py", line 62, in __init__
'api_key': settings["captcha"]["anti-captcha"]['api_key'],
KeyError: 'anti-captcha'
Main.Runtime - INFO - Job crashed: 1e2773451533411e98cfafc059f03fe0
Main.Runtime - INFO - Traceback: File "/home/derek/xA-Scraper/venv/lib/python3.6/site-packages/apscheduler/executors/base.py", line 125, in run_job
retval = job.func(*job.args, **job.kwargs)
File "./main_scrape.py", line 37, in runScraper
instance = scraper_class()
File "/home/bob/xA-Scraper/xascraper/modules/patreon/patreonScrape.py", line 62, in __init__
'api_key': settings["captcha"]["anti-captcha"]['api_key'],
Any idea how to fix that? I am running ubuntu 20.04. Also... is there any way to force the scraper to run without having to set the timer to a low refresh? I had to set it to five minutes to get it to run again so I could get that error copied. Thanks.
Also... is there any way to force the scraper to run without having to set the timer to a low refresh?
python3 -m manage run pat
?
Did you delete the relevant line from the example config?
You don't need a valid key at the moment (the actual codepath that uses it is stubbed), but patreon sometimes hits you with a recaptcha, for which I use anti-captcha.com to deal with elsewhere.
The patreon scraper is fairly finicky. It REQUIRES being run in full desktop environment, and having the google-chrome
chromium binary present. Running chromium in a full desktop session works around some of the weird client sniffing garbage webshit assholes do these days.
I didn't delete it, but I did comment that part out. The error I copied and pasted was after commenting that out.
python3 -m manage run pat
?
Oh, by the way... when I run that, I get:
Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/bob/xA-Scraper/manage/__main__.py", line 15, in <module>
from . import name_importer
File "/home/bob/xA-Scraper/manage/name_importer.py", line 6, in <module>
import psycopg2
ModuleNotFoundError: No module named 'psycopg2'
Huh. Did you not install everything in requirements.txt
? psycopg2-binary
should provide the psycopg2
package, even if it's not really used if you're using sqlite.