Errors downloading from a paid subscription
(Using Latest SourceCode from 2024-03-29)
I requested for three posts and only one was downloaded successfully and hitting the following errors
FIRST post download hit exception at
def get_url_soup(self, url: str) -> BeautifulSoup:
"""
Gets soup from URL using logged in selenium driver
"""
try:
self.driver.get(url) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< EXCEPTION HIT HERE
return BeautifulSoup(self.driver.page_source, "html.parser")
except Exception as e:
raise ValueError(f"Error fetching page: {e}") from e
CALLSTACK
get_url_soup (/Users/username/Dev/Substack2Markdown/substack_scraper.py:341)
scrape_posts (/Users/username/Dev/Substack2Markdown/substack_scraper.py:228)
main (/Users/username/Dev/Substack2Markdown/substack_scraper.py:394)
OUTPUT 0%| | 0/3 [00:00<?, ?it/s]Error scraping post: Error fetching page: Message: no such execution context (Session info: MicrosoftEdge=123.0.2420.65) Stacktrace: 0 msedgedriver 0x0000000104bc99d8 msedgedriver + 4823512 1 msedgedriver 0x0000000104bc1a13 msedgedriver + 4790803 2 msedgedriver 0x0000000104787d35 msedgedriver + 359733 3 msedgedriver 0x000000010477434a msedgedriver + 279370 4 msedgedriver 0x00000001047732a3 msedgedriver + 275107 5 msedgedriver 0x00000001047736df msedgedriver + 276191 6 msedgedriver 0x0000000104781fa4 msedgedriver + 335780 7 msedgedriver 0x000000010479211b msedgedriver + 401691 8 msedgedriver 0x00000001047968ab msedgedriver + 420011 9 msedgedriver 0x0000000104773c8b msedgedriver + 277643 10 msedgedriver 0x0000000104791da0 msedgedriver + 400800 11 msedgedriver 0x000000010480887f msedgedriver + 886911 12 msedgedriver 0x00000001047ec543 msedgedriver + 771395 13 msedgedriver 0x00000001047c0dbf msedgedriver + 593343 14 msedgedriver 0x00000001047c171e msedgedriver + 595742 15 msedgedriver 0x0000000104b85f32 msedgedriver + 4546354 16 msedgedriver 0x0000000104b8c2c6 msedgedriver + 4571846 17 msedgedriver 0x0000000104b67d5a msedgedriver + 4423002 18 msedgedriver 0x0000000104b8cd2d msedgedriver + 4574509 19 msedgedriver 0x0000000104b583d4 msedgedriver + 4359124 20 msedgedriver 0x0000000104bb0348 msedgedriver + 4719432 21 msedgedriver 0x0000000104bb04c1 msedgedriver + 4719809 22 msedgedriver 0x0000000104bc15a7 msedgedriver + 4789671 23 libsystem_pthread.dylib 0x00007ff803e6818b _pthread_start + 99 24 libsystem_pthread.dylib 0x00007ff803e63ae3 thread_start + 15
33%|███████████████████████████████████████████████████████████████▎ | 1/3 [00:16<00:32, 16.22s/it]Error scraping post: 'NoneType' object has no attribute 'text' 67%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 2/3 [00:50<00:25, 25.26s/it]
SECOND post download hit exception at
def scrape_posts(self, num_posts_to_scrape: int = 0) -> None:
"""
Iterates over all posts and saves them as markdown files
"""
...
title, subtitle, like_count, date, md = self.extract_post_data(soup) <<<<<<<<<<<<<<<<<<<<< EXCEPTION HIT HERE
CALLSTACK
scrape_posts (/Users/username/Dev/Substack2Markdown/substack_scraper.py:232)
main (/Users/username/Dev/Substack2Markdown/substack_scraper.py:394)
OUTPUT 33%|████████████████████████████▎ | 1/3 [02:02<04:05, 122.79s/it]Error scraping post: 'NoneType' object has no attribute 'text'
I tried using 2.0.0 Release version and the same issue is happening with it too.
Hmm thanks for this would you be able to share the substack that you hit these exceptions on? And were the posts where you got these errors premium posts, and was the one that downloaded successfully free? I'm busy until early May but will look into this thereafter.