digital-resources
digital-resources copied to clipboard
OG scraper for list of URLs
In order for us to be able to add the content from all the 1600+ links in the document, we require an Open Graph and metadata scraper to:
- parse through all 1600+ remote URL resources
- fetch the meta information (minimum:
title
,description
,og:image
) for each URL - export a Markdown file for each link inside
./src/content/resources/
, following the schema in./src/content/config.ts
, and name the .md file based on the meta title, in a safe way (alphanumeric +_
or-
) to avoid duplicates for resources with identical titles - download the OG image, name it identical to the Markdown file and place it in the same folder as the Markdown
- (optional) use OpenAI ChatGPT API to optimize description and try to write some extensive descriptive content
- (optional) use MidJourney API to generate a unique OG image based on the original (if one exists) or on the first fold of the remote URL
Potential APIs and packages
- [x] https://www.npmjs.com/package/meta-fetcher (13.6kb)
- [ ] https://www.npmjs.com/package/fetch-opengraph (14kb)
- [ ] https://www.npmjs.com/package/url-metadata (22.7kb)
- [ ] https://www.npmjs.com/package/open-graph-scraper (88.3kb)
- [ ] https://www.npmjs.com/package/isomorphic-unfetch (3.38kb) vs https://www.npmjs.com/package/axios (1770kb) https://www.zenrows.com/blog/axios-web-scraping