daps Fix #510: Make XPointer work

This PR fixes #510.

Jing fails as it depends on Xerces. As Xerces does not support the xpointer() scheme, we need to give jing a "complete" document. The document is created by xmllint.

Currently it's an ugly implementation as it appends ".resolved" to the PROFILED_MAIN variable. Probably we need to adapt the code in other locations too.

For the time being, it's just fixes the validate subcommand.

Addition: what's about the following suggestion?

daps validate would use the current validation. Nothing changed.
daps validate --xpointer would enable the different resolution path with XPointers (resolve them with xmllint and validate with jing).

May 23 '19 14:05 tomschr

daps validate --xpointer

@tomschr What is the point of adding an option here instead of just making this default behavior? Getting validation failures unless you run with an extra option sounds rather inconvenient.

May 27 '19 12:05 ghost

daps validate --xpointer

@tomschr What is the point of adding an option here instead of just making this default behavior? Getting validation failures unless you run with an extra option sounds rather inconvenient.

It is. Actually I consider it a hack. The whole problem are our tools that we use:

Tool	Supported XPointer Schemes	Validation Support RELAX NG
xmllint	`xpointer()` and `element()`	partly
jing	`element()`	full

Whichever component you use, you will run into problems. :cry:

IMHO, the "default behavior" should be the validation with jing. I guess, not all projects need or want XPointers, so they can stick with the Jing validation. The "extended validation" is mainly needed for SLE.

Unfortunately, if we really want to support XPointer (and we already use it) I don't see a better solution yet (except the "option" to fix the tools, but you know how this will end...). Unfortunately, we have some implementation issues:

Creating a bigfile with xmllint and validating it with jing would technically work. However, if there are validation errors, you will loose the filenames (as it's only one file). That makes it harder to spot any errors.
Validating the document with xmllint alone won't work as validation with RELAX NG is only partially supported. :cry:
Validating the document with jing alone does work unless nobody uses the xpointer() scheme.

So how to solve this? That's why I've suggested this additional option (we could also named it --full).

May 27 '19 13:05 tomschr

IMHO, the "default behavior" should be the validation with jing. I guess, not all projects need or want XPointers, so they can stick with the Jing validation. The "extended validation" is mainly needed for SLE.

Is there any real downside to just applying the XPointer fix in all cases? (Besides the extra second that it may take.)

I am absolutely not in favor of making life harder for everyone by adding extra parameters. I would be much in favor of using the same validation everywhere.

May 27 '19 14:05 ghost

IMHO, the "default behavior" should be the validation with jing. I guess, not all projects need or want XPointers, so they can stick with the Jing validation. The "extended validation" is mainly needed for SLE.

Is there any real downside to just applying the XPointer fix in all cases? (Besides the extra second that it may take.)

Well, depends what you mean with real. Seeing not the filenames is a real problem for some, do you agree? It makes debugging harder.

I am absolutely not in favor of making life harder for everyone by adding extra parameters. I would be much in favor of using the same validation everywhere.

Sure, that would be my aim as well. Unfortunately, the tools are the bottleneck.

I think it boils down to this simple question: do we want to support XPointers? If no, we can close this issue. If yes, we need to find a solution. At the moment we have these options:

Create an intermediate bigfile and validate that.
Find another solution to keep the filename.
Fix the tools.

@fsundermeyer Could we have number 3, please?

May 27 '19 15:05 tomschr

Seeing not the filenames is a real problem for some, do you agree? It makes debugging harder.

That is a big downside. I missed that part, sorry.

Otoh--don't we have xml:base for that? Is there any way we could expose that information via jing?

I think it boils down to this simple question: do we want to support XPointers?

A definite "yes" from me.

May 27 '19 15:05 ghost

[...] Otoh--don't we have xml:base for that? Is there any way we could expose that information via jing?

Yes, we have xml:base. That is automatically added during XInclude resolution. However, exposing it to jing doesn't have any advantages. It's just an attribute, nothing special.

Unfortunately, I fear, we have not many options:

Validate with jing and bigfile and live with the drawback that we need to do some more research when having validation issues.
Create some magic daps-validate command which does some "tricks" (whatever they will be).
Fix the tools (which would my preferred option).

As number 3 is probably unlikely or takes too much time, we are left with 1 + 2. If we do not want to accept the drawback on 1, that's gives us only option 2.

If we theoretically implement a daps-validate command it could do the following steps:

Check for well-formedness. If there are any errors, the writer needs to fix them first.
Create a bigfile with xmllint. We need to resolve any XIncludes and XPointers.
Validate the bigfile with jing.
1. If the validation was successful: be happy and dance. :man_dancing:
2. If the validation failed: a. Split the bigfile according to the xml:base (could be done by some XSLT magic). b. Iterate over all files created by xml:base and validate it via jing -i.

I know, this sound terribly complicated and probably it is. :cry: It's likely there are many issues with this approach. Unfortunately, I can't think of a better way.

Do any of you have a better idea?

May 28 '19 08:05 tomschr

Seeing not the filenames is a real problem for some, do you agree? It makes debugging harder. That is a big downside. I missed that part, sorry. Otoh--don't we have xml:base for that? Is there any way we could expose that information via jing? I think it boils down to this simple question: do we want to support XPointers? A definite "yes" from me.

It rather boils down to the question "at what price do we want to support xpointers"? And I am definitely not willing to pay the price of crippling validation to being almost unusable. If we validate on a bigfile, we need to come up with some code that allows us to point to the file and the line where the error occurred.

Does the xmllint run you proposed change the general formatting?

May 28 '19 08:05 fsundermeyer

If the validation failed: a. Split the bigfile according to the xml:base (could be done by some XSLT magic). b. Iterate over all files created by xml:base and validate it via jing -i.

Do any of you have a better idea?

Jing somehow returns the line number where the error occurs. We would need to find the corresponding xml:id plus the corresponding xml:base. Can that be done via xslt?

May 28 '19 08:05 fsundermeyer

It rather boils down to the question "at what price do we want to support xpointers"? And I am definitely not willing to pay the price of crippling validation to being almost unusable.

Our validation already spits out useless line numbers and file paths that don't lead to the real source file in many cases, e.g.:

[...]doc-sle/build/.profiled/x86_64_zseries_power_aarch64_sles/yast2_sw.xml:43:19: error: element "varlistentry" incomplete; missing required element "term"

In this sense, we're just gradually getting worse (which is not a good thing either of course). You only ever get the actual line number/file if your file is not well-formed.

Jing somehow returns the line number where the error occurs. We would need to find the corresponding xml:id plus the corresponding xml:base. Can that be done via xslt?

Saxon has an extension function for this.
libxml + Python/lxml provides this as well but libxml is limited to 65k lines (I use this one in the style checker, but the line number is currently useless, so it's never displayed).

Hitches:

A lot of lines don't have elements with IDs on them to begin with
Some source IDs are removed when profiling, <remark/>, comments and DOCTYPE headers may also be profiled away [though we could fix that], all of this leads to a significant line diff between original file and output file

May 28 '19 09:05 ghost

daps daps copied to clipboard

Fix #510: Make XPointer work

daps
daps copied to clipboard