core
core copied to clipboard
processor decorator: bundle module documentation
We often have lots of useful documentation for processors which the ocrd-tool.json does not and cannot cover:
- README file
- DITA files
- other documentation (publications, research notes, markdown files) in the repo
It would be great if these could be made available to users directly (without the need for them to find out the project URL and browse the repo). This could be done via some entry point (showing URLs or even man-pages) on the command line with an extra CLI option, say --about, or under --help.
Ideas:
- dump the README verbatim (can be read in pager on command line), or print its repo URL
- show the local path to DITA output (if bundled), like PDF or troff files (which should be suitable for browsing with
manon the command line) - just show repo URL
Related: https://github.com/OCR-D/spec/issues/119
Since we require README.md, this could be implemented right away. ocrd-foo --help-readme to filter markdown through pygments to a PAGER for example.
I am not sure about the state of DITA documentation for processors. We'll need to wait for @tboenig to be back in office.
A section in the help message with URLs to GitHub project/README.md and possibly links to models and other documentation. ocrd-tool has fields for git_url and dockerhub for the image name. We could either extend that or just add a generic text-href mapping of Links to show in this section.
Since we require
README.md, this could be implemented right away.
But modules usually don't bundle it into their Python distro, do they?
If they packaged it under PEP566 Description / long_description and Description-Content-Type / long_description_content_type, then we should in principle be able to access it from Python though:
import pkg_resources
pkg = pkg_resources.get_distribution(__module__)
meta = '\n'.join(pkg._get_metadata(pkg.PKG_INFO))
description = meta[meta.find('Description: ')+13,meta.find('Platform:')]
dtype = meta[meta.find('Description-Content-Type: ')+26:]
(Python 3.8 would make life easier here via importlib.metadata...)
ocrd-foo --help-readmeto filter markdown through pygments to a PAGER for example.
Yes, or --readme or --doc. There's also a package mrkd BTW which builds on pygments to output troff (suitable for man browser).
A section in the help message with URLs to GitHub project/README.md and possibly links to models and other documentation.
Yes, that's still not too long. Or defer all that to --about (and including version, author/copyright, license etc).
ocrd-tool has fields for
git_urlanddockerhubfor the image name.
Right!
Cf. #623 for a first partial attempt.