lazyblorg
lazyblorg copied to clipboard
derive list of external URLs
Add an additional command line parameter similar to --external-urls $FILE
. This $FILE
gets populated with CSV content like:
http://foo.example.com/an/old/path; id:2020-04-12-article-id
https://bar.example.com; id:another-id
This can be used as input for any external URL checker in order to find broken links.
Implementation might relate to #41 and #44 which also might potentially extract data without generating the actual blog data (not decided yet).
- [x] decide: accept duplicated entries for combinations and/or URLs only?
- [x] decide: only as a by-product of a normal lazyblorg generation run or also as a stand-alone feature?
An approach could be to combine this with #44 in the following way:
A command line option results in multiple files generated in parallel to a normal blog generation run:
-
${BLOG_NAME}_URLs.csv
: a CSV file with tabs as separators (which don't appear in the data) with lines (one per URL) with:
- ID of the article
- the optional description used for the description (sanitized for tabs)
- an external URL referred in the article (sanitized for tabs)
-
${BLOG_NAME}_files.csv
: a CSV file with tabs as separators (which don't appear in the data) with lines (one per referred file) with:
- ID of the article
- the optional description used for the description (sanitized for tabs)
- an external file reference in the article (sanitized for tabs): mostly image files
- [ ] absolute file name versus relative or ts-basename?
- [ ] how to deal with non-matching basenames within the Org file which got fixed by the built-in "fuzzy" search?