lazyblorg icon indicating copy to clipboard operation
lazyblorg copied to clipboard

derive list of external URLs

Open novoid opened this issue 4 years ago • 2 comments

Add an additional command line parameter similar to --external-urls $FILE. This $FILE gets populated with CSV content like:

http://foo.example.com/an/old/path; id:2020-04-12-article-id
https://bar.example.com; id:another-id

This can be used as input for any external URL checker in order to find broken links.

Implementation might relate to #41 and #44 which also might potentially extract data without generating the actual blog data (not decided yet).

novoid avatar Apr 12 '20 08:04 novoid

  • [x] decide: accept duplicated entries for combinations and/or URLs only?
  • [x] decide: only as a by-product of a normal lazyblorg generation run or also as a stand-alone feature?

novoid avatar Oct 04 '20 10:10 novoid

An approach could be to combine this with #44 in the following way:

A command line option results in multiple files generated in parallel to a normal blog generation run:

  1. ${BLOG_NAME}_URLs.csv: a CSV file with tabs as separators (which don't appear in the data) with lines (one per URL) with:
  • ID of the article
  • the optional description used for the description (sanitized for tabs)
  • an external URL referred in the article (sanitized for tabs)
  1. ${BLOG_NAME}_files.csv: a CSV file with tabs as separators (which don't appear in the data) with lines (one per referred file) with:
  • ID of the article
  • the optional description used for the description (sanitized for tabs)
  • an external file reference in the article (sanitized for tabs): mostly image files
    • [ ] absolute file name versus relative or ts-basename?
    • [ ] how to deal with non-matching basenames within the Org file which got fixed by the built-in "fuzzy" search?

novoid avatar Oct 24 '20 16:10 novoid