pmaw
pmaw copied to clipboard
signal only works in main thread
Hello, I'm currently developing a very simple Flask app, running only locally and I wanted to scrape some Reddit posts, using your API. I followed the example, as it's specified in the documentation, however whenever I run my script, I get the following error:
ValueError: signal only works in main thread
I read that Flask-SocketIO package causes this, but I saw that this project uses Websocket-client, which is a different package.
Would really appreciate your input.
hey @rosendyakov can you provide the following info, and the minimum amount of code needed to re-create the issue? This will help me as I look into this further:
python version: flask version: pmaw version:
I have the same issue in deploying a web application using pmaw
File "/app/adam-radar/Python-Scripts/User Specified Scripts/Discussion Platforms/Reddit/reddit_submissions_by_keywords.py", line 111, in reddit_submissions api_request_generator = api.search_submissions(q=keyword,after=start_time,before=end_time) File "/home/appuser/venv/lib/python3.9/site-packages/pmaw/PushshiftAPI.py", line 77, in search_submissions return self._search(kind="submission", **kwargs) File "/home/appuser/venv/lib/python3.9/site-packages/pmaw/PushshiftAPIBase.py", line 304, in _search self.req.check_sigs() File "/home/appuser/venv/lib/python3.9/site-packages/pmaw/Request.py", line 110, in check_sigs signal.signal(getattr(signal, "SIG" + sig), self._exit) File "/usr/local/lib/python3.9/signal.py", line 56, in signal handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
I am hitting the same symptom, though my setup is a little bit more involved (Azure Durable Functions), so in order to make up for the added complexity I published my repro to https://github.com/mike-mo/azure-durable-pmaw
Python version: 3.10.10 Azure Functions Core Tools Version: 4.0.5030 Azure Functions Runtime Version: 4.15.2.20177 pmaw version: 3.0.0
Same Python version and pmaw version work fine to run a basic script that fetches the information. It must be something to do with how threading is handled by these frameworks.
having the same issue here, i'm using multiprocessing.pool.ThreadPool to call the api function
`def run_download(subreddits: list, start_date: int, end_date: int, additional_args: dict,
working_dir: Path = None) -> DataFrame:
logger.info(f"Starting Download from Reddit using subreddits {subreddits}")
all_df = DataFrame()
with ThreadPool() as pool:
query = {'start_date': start_date,
'end_date': end_date,
"working_dir": working_dir,
**additional_args}
lst_df =pool.starmap(_get_subreddit, [(start_date,end_date, subreddit) for subreddit in subreddits])
for df in lst_df:
if df.empty:
continue
all_df = concat([df, all_df], axis=0)
def _get_subreddit(self, start_time: int, end_time: int, subreddit_name=None, **kwargs) -> DataFrame:
params = self.params.copy()
params.update(kwargs)
subreddit_name = subreddit_name or self.subreddit_name
df = DataFrame(
self.api.search_submissions(subreddit=subreddit_name, since=start_time, until=end_time, **params))
return df.drop_duplicates('id')
`
i have tried this with async function but it also didn't help.
ValueError: signal only works in main thread of the main interpreter
Did you manage to solve this? Running into a similar issue using an Azure Durable Function with Scrapy and signal