Various strange things in oddybyexample stylesheet
The oddbyexample stylesheet in Tools is a potentially useful way of generating automatically an ODD documenting encoding practice in a corpus of TEI documents. I've been testing it while writing a new tutorial on the subject (http://teic.github.io/TCW/howtoGenerate.html). So far I've noticed the following things in need of attention:
- the parameter "method" which supposedly determines whether the generated ODD uses @include or @except, in fact has no effect at all. It should probably be removed.
- there are several parameters (keepGlobals, enumerateType, enumerateRend et al) controlling which attributes should be provided with valLists in the output, and it's not clear how they interact or whether they are all necessary; or indeed whether any of them works
- the generated ODD includes null declarations for every class declared by a module, whether any elements from that module are actually present in the corpus or not
IIRC the original oddbyexample always used except, which was not ideal; Sebastian changed it to use include instead; the method param was intended to support the original behaviour, but presumably never got implemented.
not sure if this is a bug or my own inaptitude, but when pointing to 'tei_simplePrint.odd' (stored locally) as defaultSource the output is not confidence inspiring.
Before I go on a wild-goose chase, could someone confirm that this is or isn't how it is supposed to work? My goal is to create a myGenerated.odd that deletes/adds elements and classes based on a comparison of my example corpus and simplePrint instead of tei_all.
Shouldn't defaultSource point to the Guidelines? (I.e. the released copy or local copy?) So pointing to an ODD won't work but a 'compiled ODD' like p5subset.xml might work. So if you use oddtorelaxng but with --odd and --debug it should leave a copy of the compiled behind to then point at as a source. But I'm not sure why you want to do this? It generates an odd from a corpus. If it then includes things you don't want then edit the resulting odd to remove them. Am I misunderstanding?
(All untested and from a phone so not necessarily accurate!)
I believe @jamescummings is right, that the "source" has to be a compiled ODD. But this may indicate a shortcoming in the documentation. Are you doing this from the command line, @duncdrum?
actually i m doing this from inside exist-db, but @jamescummings comment got me on the right track.
The reason why I want to be able to point to simplePrint, is for the sake of user friendliness. If a user has a tei file that uses simplePrint, oddbyexample will just duplicate all the simplePrint modifications in myExample, without differentiating between simplePrint customization and actual user customization.
Obviously not a major problem, but if i can make it configurable it would be nicer.
@hcayless Yes pointing to compiled ODD in the documentation sounds like a good idea. Would that be Lou's link, ie. http://teic.github.io/TCW/howtoGenerate.html
If you want to derive your ODD from simplePrint you need to supply a compiled version of simplePrint as value of the @source attribute, as James says. See further http://teic.github.io/TCW/howtoChain.html . It hadn't occurred to me that you might also want to constrain oddbyexample to use a different source, but setting @defaultSource to point to a compiled ODD certainly ought to work (pointing to a non-compiled one definitely won't) -- does it? As to making this more explicit in the documentation, there is no documentation. I am happy to add a comment to my existing tutorial, but it seems to me there are quite a few other aspects of the existing spec (which is essentially the list of xsl:params at the start of the stylesheet) that need attention too. Which was the point of raising this ticket.
@lb42 thanks for the help. To answer your question, I would like to add the odd-by-example functionality to tei-publisher, where a lot stuff is already based on simplePrint. So I would like to give users the option to choose either simplePrint or tei_all as the base file for their custom odd. I understand that its not strictly necessary.
Pointing to the uncompiled tei_simplePrint.odd does not raise any errors, the output is valid, but incomplete (to the tune of 90% of declarations missing).
using odd2odd.xsl to compile simplePrintSubset.xml from the odd does work, but results in an invalid file (90 errors 88 invalid <attList/>, one wrongly placed <p>, one wrongly placed <datatype>). Feeding this to oddbyexample does not work.
So the method param seems to work correctly, but i m not sure how to proceed with the validation errors for the compiled simplePrint file. I also tried to point to 'http://www.tei-c.org/Vault/P5/current/xml/tei/Exemplars/tei_simplePrint.doc.xml', since it is was the only simplePrint xml file that i could find in the vault, but that one didn't work either.
Thanks for reporting this. I had noticed the bugs in odd2odd earlier but failed to do anything about it. There are at least two bugs, which I will specify in a separate ticket, since they have nothing to do with this one (except insofaras you hit them too!). To answer your question: simplePrint.doc.xml is just the prose wrapped around the simplePrint ODD. To correct the errors in the compiled tei_simplePrint.odd (a) delete all empty attLists (b) move the misplaced <datatype> ahead of its sibling <constraintSpec> (c) move the misplaced <p> outside its parent <specGrp>. Or wait for someone to take action on #319
FIY i have run some stress-tests and after diffing the results, am happy to report that oddbyexample seems to work as expected with different defaultSources (provided they are valid). I ll open a PR, that adds the name of the default source to the outputs header. Since its user configurable, it should be mentioned somewhere I think.
with the release of tei-publisher 3.1.0 users can now run oddbyexample transformations from its UI (or via their own xquery code) documentation

This good news reminds me that something still needs to be done about the fact that odd2odd generates invalid source. Maybe the patch Piotr suggests on #319 is not such a bad idea.
Btw. the corpusList parameter doesn't work as expected (by me).
I had to use the following funny syntax to make it process a single file only:
java -jar "./Stylesheets/lib/saxon9he.jar" \
"./Stylesheets/tools/oddbyexample.xsl" \
-o:"$OUTPUT_DIR/$output_file" \
-it:main \
corpusList=$(realpath $INPUT_DIR)?select=$INPUT_FILE
(where $INPUT_FILE is a file inside $INPUT_DIR)