api-doc-tools icon indicating copy to clipboard operation
api-doc-tools copied to clipboard

Feature request: Lexical canonicalization of XML with mdoc

Open MichaelNorman opened this issue 8 years ago • 8 comments

It would be nice to be able to round-trip API docs through mdoc in order to standardize white space, attributes, and so on. For example:

mdoc --canonicalize ./Application.xml

would result in Application.xml being loaded into whatever XML libraries that mdoc uses and being written back to itself. Something like:

mdoc --canonical-diff-only ./Application.xml

would do the same thing, except report what it would do to stdout, rather than writing back to Application.xml.

If the filespec were omitted, perhaps the default behavior would be to recurse over the current directory. Since it is likely that this command would be run on altered but not committed files, it might make sense to have an explicit --no-confirm option, and have the default mode be interactive.

The motivation for this is to reduce clutter in commits and to allow greater freedom for authors to script their own solutions to adding/removing/changing content with the XML toolchain of their choice, simply canonicalizing their files afterward. Also, more modern XML-handling libraries, such as Linq to XML, have arrived on the scene, so this would enhance the writers' scripting experience while allowing mdoc and its developers to remain locked in the past with XmlDocument and its ilk, unperturbed by requests to update the XML engine to ease writers' pain.

MichaelNorman avatar Apr 12 '17 17:04 MichaelNorman

this is a great idea ... it's common for teams that do extracurricular XML processing on the EcmaXML, to simply run mdoc update multiple times in order to accomplish this same goal. This would allow users that do not have access to the assembiles to run update to keep things in line.

I'd propose this be implemented as an MdocCommand ... so the usage would be mdoc canonicalize. And perhaps using a -whatif to trigger the "diff-only" you mentioned above. This is in line with how powershell often gives you the ability to check what a command is going to do without actually doing it.

joelmartinez avatar Apr 12 '17 17:04 joelmartinez

I'm +1 on this. It allows confident integration of processing scripts, i.e., mdoc --initialState | someScript | mdoc canonicalize

Update: Is this mdoc normalize?

lobrien avatar Apr 12 '17 19:04 lobrien

For the -whatif option, we'd need some form of doing the diffs (in order to show what changed) ... There are a number of diff libraries available, but does anyone have any thoughts on good ones?

joelmartinez avatar Apr 12 '17 19:04 joelmartinez

Why is this not the existing function mdoc normalize?

It appears to me that normalize is not integrated into mdoc itself and is therefore invisible to use (given that none of us seems to have noticed it). I'd suggest we integrate it, make sure we have tests, make sure it appears in the help text, etc. (Also, the idea of adding -whatif is still a good idea, if a bit more ambitious).

lobrien avatar Apr 12 '17 19:04 lobrien

hah! honestly, I'd literally never even seen that mdoc command (I've been focused on update and assemble commands :P ).

Yes this does accomplish part of what has been requested ... it loads the xml, and saves it. However, it doesn't use the same settings as what mdoc update uses ... so this would have to be adjusted, in addition to adding the -whatif feature.

joelmartinez avatar Apr 12 '17 19:04 joelmartinez

oh, womp womp ... that command, while present in the codebase, isn't even added as a subcommand when the application is initialized. That might explain why I'd never noticed it :)

joelmartinez avatar Apr 12 '17 19:04 joelmartinez

Just a note to say that it would be nice to be able to specify a single file, directory, or list of files. The user might not always want to operate on every file, both for time/perf reasons and because they may have other files open (or half-edited with mismatched tags, or...).

MichaelNorman avatar Apr 13 '17 17:04 MichaelNorman

Revisiting this. The above discussion makes me wonder if instead of integrating it as a subcommand, we just have a trivial CLI connected to a DLL with a public API. The CLI works on a file list. So just use normal UNIX / Powershell piping to specify things. The CLI just reads the file as an XDocument and writes it using a standard XmlWriter using the whatever-options-we-desire.

lobrien avatar Jul 13 '18 18:07 lobrien