Substack2Markdown icon indicating copy to clipboard operation
Substack2Markdown copied to clipboard

Add option to download images

Open 64bitpandas opened this issue 1 year ago • 0 comments

I'm using this tool to mirror some of my Substack posts to my website, and as part of that process I'd really like to host my own images instead of having them link to the Substack CDN!

In case this will help someone else, here's a PR 🙂

Here's a list of some tweaks I made to get that to happen:

  • Add an --images flag that will download images for all posts being scraped into a substack_images/ folder
  • Add an option to download a single post (by passing in a --url in the format https://example.substack.com/p/postname
  • When downloading images, Substack nests them like [![alt](/path)](/path). Change these to just be ![alt](/path) so clicking on the images doesn't link to itself.
  • Add some tests, to prove to myself this code works the way I expect it

As a bonus, the progress bars reflect image downloads (since they can take a while)! As an example:

Scraping posts: 100%|██████████| 2/2 [00:30<00:00, 15.00s/post]
  Downloading images for test-post: 100%|██████████| 7/7 [00:14<00:00, 2.00s/image]
  Downloading images for another-post: 100%|██████████| 4/4 [00:08<00:00, 2.00s/image]

64bitpandas avatar Dec 31 '24 09:12 64bitpandas