core icon indicating copy to clipboard operation
core copied to clipboard

processor decorator: bundle module documentation

Open bertsky opened this issue 5 years ago • 4 comments
trafficstars

We often have lots of useful documentation for processors which the ocrd-tool.json does not and cannot cover:

  • README file
  • DITA files
  • other documentation (publications, research notes, markdown files) in the repo

It would be great if these could be made available to users directly (without the need for them to find out the project URL and browse the repo). This could be done via some entry point (showing URLs or even man-pages) on the command line with an extra CLI option, say --about, or under --help.

Ideas:

  • dump the README verbatim (can be read in pager on command line), or print its repo URL
  • show the local path to DITA output (if bundled), like PDF or troff files (which should be suitable for browsing with man on the command line)
  • just show repo URL

bertsky avatar Jul 15 '20 11:07 bertsky

Related: https://github.com/OCR-D/spec/issues/119

bertsky avatar Jul 15 '20 11:07 bertsky

Since we require README.md, this could be implemented right away. ocrd-foo --help-readme to filter markdown through pygments to a PAGER for example.

I am not sure about the state of DITA documentation for processors. We'll need to wait for @tboenig to be back in office.

A section in the help message with URLs to GitHub project/README.md and possibly links to models and other documentation. ocrd-tool has fields for git_url and dockerhub for the image name. We could either extend that or just add a generic text-href mapping of Links to show in this section.

kba avatar Jul 15 '20 13:07 kba

Since we require README.md, this could be implemented right away.

But modules usually don't bundle it into their Python distro, do they?

If they packaged it under PEP566 Description / long_description and Description-Content-Type / long_description_content_type, then we should in principle be able to access it from Python though:

import pkg_resources
pkg = pkg_resources.get_distribution(__module__)
meta = '\n'.join(pkg._get_metadata(pkg.PKG_INFO))
description = meta[meta.find('Description: ')+13,meta.find('Platform:')]
dtype = meta[meta.find('Description-Content-Type: ')+26:]

(Python 3.8 would make life easier here via importlib.metadata...)

ocrd-foo --help-readme to filter markdown through pygments to a PAGER for example.

Yes, or --readme or --doc. There's also a package mrkd BTW which builds on pygments to output troff (suitable for man browser).

A section in the help message with URLs to GitHub project/README.md and possibly links to models and other documentation.

Yes, that's still not too long. Or defer all that to --about (and including version, author/copyright, license etc).

ocrd-tool has fields for git_url and dockerhub for the image name.

Right!

bertsky avatar Jul 15 '20 14:07 bertsky

Cf. #623 for a first partial attempt.

bertsky avatar Oct 09 '20 08:10 bertsky