daps
daps copied to clipboard
Fix #510: Make XPointer work
This PR fixes #510.
Jing fails as it depends on Xerces. As Xerces does not support the xpointer()
scheme, we need to give jing a "complete" document. The document is created by xmllint
.
Currently it's an ugly implementation as it appends ".resolved" to the PROFILED_MAIN
variable. Probably we need to adapt the code in other locations too.
For the time being, it's just fixes the validate
subcommand.
Addition: what's about the following suggestion?
-
daps validate
would use the current validation. Nothing changed. -
daps validate --xpointer
would enable the different resolution path with XPointers (resolve them withxmllint
and validate withjing
).
daps validate --xpointer
@tomschr What is the point of adding an option here instead of just making this default behavior? Getting validation failures unless you run with an extra option sounds rather inconvenient.
daps validate --xpointer
@tomschr What is the point of adding an option here instead of just making this default behavior? Getting validation failures unless you run with an extra option sounds rather inconvenient.
It is. Actually I consider it a hack. The whole problem are our tools that we use:
Tool | Supported XPointer Schemes | Validation Support RELAX NG |
---|---|---|
xmllint | xpointer() and element() |
partly |
jing | element() |
full |
Whichever component you use, you will run into problems. :cry:
IMHO, the "default behavior" should be the validation with jing. I guess, not all projects need or want XPointers, so they can stick with the Jing validation. The "extended validation" is mainly needed for SLE.
Unfortunately, if we really want to support XPointer (and we already use it) I don't see a better solution yet (except the "option" to fix the tools, but you know how this will end...). Unfortunately, we have some implementation issues:
- Creating a bigfile with
xmllint
and validating it withjing
would technically work. However, if there are validation errors, you will loose the filenames (as it's only one file). That makes it harder to spot any errors. - Validating the document with
xmllint
alone won't work as validation with RELAX NG is only partially supported. :cry: - Validating the document with
jing
alone does work unless nobody uses thexpointer()
scheme.
So how to solve this? That's why I've suggested this additional option (we could also named it --full
).
IMHO, the "default behavior" should be the validation with jing. I guess, not all projects need or want XPointers, so they can stick with the Jing validation. The "extended validation" is mainly needed for SLE.
Is there any real downside to just applying the XPointer fix in all cases? (Besides the extra second that it may take.)
I am absolutely not in favor of making life harder for everyone by adding extra parameters. I would be much in favor of using the same validation everywhere.
IMHO, the "default behavior" should be the validation with jing. I guess, not all projects need or want XPointers, so they can stick with the Jing validation. The "extended validation" is mainly needed for SLE.
Is there any real downside to just applying the XPointer fix in all cases? (Besides the extra second that it may take.)
Well, depends what you mean with real. Seeing not the filenames is a real problem for some, do you agree? It makes debugging harder.
I am absolutely not in favor of making life harder for everyone by adding extra parameters. I would be much in favor of using the same validation everywhere.
Sure, that would be my aim as well. Unfortunately, the tools are the bottleneck.
I think it boils down to this simple question: do we want to support XPointers? If no, we can close this issue. If yes, we need to find a solution. At the moment we have these options:
- Create an intermediate bigfile and validate that.
- Find another solution to keep the filename.
- Fix the tools.
@fsundermeyer Could we have number 3, please?
Seeing not the filenames is a real problem for some, do you agree? It makes debugging harder.
That is a big downside. I missed that part, sorry.
Otoh--don't we have xml:base for that? Is there any way we could expose that information via jing?
I think it boils down to this simple question: do we want to support XPointers?
A definite "yes" from me.
[...] Otoh--don't we have xml:base for that? Is there any way we could expose that information via jing?
Yes, we have xml:base
. That is automatically added during XInclude resolution. However, exposing it to jing doesn't have any advantages. It's just an attribute, nothing special.
Unfortunately, I fear, we have not many options:
- Validate with jing and bigfile and live with the drawback that we need to do some more research when having validation issues.
- Create some magic
daps-validate
command which does some "tricks" (whatever they will be). - Fix the tools (which would my preferred option).
As number 3 is probably unlikely or takes too much time, we are left with 1 + 2. If we do not want to accept the drawback on 1, that's gives us only option 2.
If we theoretically implement a daps-validate
command it could do the following steps:
- Check for well-formedness. If there are any errors, the writer needs to fix them first.
- Create a bigfile with
xmllint
. We need to resolve any XIncludes and XPointers. - Validate the bigfile with
jing
.- If the validation was successful: be happy and dance. :man_dancing:
- If the validation failed:
a. Split the bigfile according to the
xml:base
(could be done by some XSLT magic). b. Iterate over all files created byxml:base
and validate it viajing -i
.
I know, this sound terribly complicated and probably it is. :cry: It's likely there are many issues with this approach. Unfortunately, I can't think of a better way.
Do any of you have a better idea?
Seeing not the filenames is a real problem for some, do you agree? It makes debugging harder. That is a big downside. I missed that part, sorry. Otoh--don't we have xml:base for that? Is there any way we could expose that information via jing? I think it boils down to this simple question: do we want to support XPointers? A definite "yes" from me.
It rather boils down to the question "at what price do we want to support xpointers"? And I am definitely not willing to pay the price of crippling validation to being almost unusable. If we validate on a bigfile, we need to come up with some code that allows us to point to the file and the line where the error occurred.
Does the xmllint run you proposed change the general formatting?
- If the validation failed: a. Split the bigfile according to the
xml:base
(could be done by some XSLT magic). b. Iterate over all files created byxml:base
and validate it viajing -i
.
Do any of you have a better idea?
Jing somehow returns the line number where the error occurs. We would need to find the corresponding xml:id plus the corresponding xml:base. Can that be done via xslt?
It rather boils down to the question "at what price do we want to support xpointers"? And I am definitely not willing to pay the price of crippling validation to being almost unusable.
Our validation already spits out useless line numbers and file paths that don't lead to the real source file in many cases, e.g.:
[...]doc-sle/build/.profiled/x86_64_zseries_power_aarch64_sles/yast2_sw.xml:43:19: error: element "varlistentry" incomplete; missing required element "term"
In this sense, we're just gradually getting worse (which is not a good thing either of course). You only ever get the actual line number/file if your file is not well-formed.
Jing somehow returns the line number where the error occurs. We would need to find the corresponding xml:id plus the corresponding xml:base. Can that be done via xslt?
- Saxon has an extension function for this.
- libxml + Python/lxml provides this as well but libxml is limited to 65k lines (I use this one in the style checker, but the line number is currently useless, so it's never displayed).
Hitches:
- A lot of lines don't have elements with IDs on them to begin with
- Some source IDs are removed when profiling,
<remark/>
, comments and DOCTYPE headers may also be profiled away [though we could fix that], all of this leads to a significant line diff between original file and output file