html-posts-to-markdown
html-posts-to-markdown copied to clipboard
Save your online html posts as markdown using puppeteer
html posts to markdown
A node.js tool to extract html posts from webpages using puppeteer , extract them to markdown and save them.
Disclaimer
I haven't really tested this and there are many things missing, but it works for my use case.
Setup
- Clone the repository
- Run
npm install
Usage
node index.js --url="https://justmarkup.com" --postSelector=".main .article h2 a" --titleSelector=".article h1" --contentSelector=".article .entry-content" --dir="/posts/"
Options
| Option | Default | Description |
|---|---|---|
--url |
https://justmarkup.com | The entry page containing links to the posts |
--postSelector |
.main .article h2 a | The selector for all the links to your posts |
--titleSelector |
.article h1 | The selector for the title of your post |
--contentSelector |
.article .entry-content | The selector for the content wrapper of your post |
--dir |
/posts/ | The directory where the posts should be saved |