dokieli icon indicating copy to clipboard operation
dokieli copied to clipboard

Provide validator

Open nichtich opened this issue 5 years ago • 3 comments

The documentation on HTML patterns states

Ultimately, well-formed and valid HTML - along with accompanying RDFa, Turtle, JSON-LD, TriG etc - is the only requirement here

This is only true in theory. In practice dokieli.js expects some patterns such as Polyglot Markup (?), Outline with title, main > article, and section etc. I stumbled upon this with a document that uses a flat list of h1, h2... so the table of contents could not be generated properly.

It would be helpful to have a validator script that checks for

  • well-formed and valid HTML
  • expected outline (e.g. section elements)
  • well-formed embedded RDF (if given)
  • meaningful embedded RDF (e.g. no unknown namespaces)
  • best-practice (e.g. aside should appear as the last element node in section)

The validation rules do not need to be very strict but even "well-formed and valid HTML+RDFa" must be validated. All data should be expected as not conforming to any standards unless the conformance is actually checked.

nichtich avatar Jun 23 '19 20:06 nichtich

I think it goes a bit without saying that dokieli will consume and try to accommodate certain patterns out there, but it can't turn any arbitrary pattern into something useful. We can of course improve recognising commonly used patterns (eg. the example with flat headings? IIRC, the HTML spec has an algorithm for an outline that could be perhaps implemented here). Nevertheless, it is not an all or nothing situation, so some of the functions can still work in dokieli eg. while the webpage may have garbage HTML, we can still annotate I think.

As for what it generates, it has its own patterns with the intention of having some consistency and reuse.

I've been hesitant to bring a validator to the mix for two reasons:

  • there are going to be things that's outside of dokieli's knowledge that the author wants, so dokieli shouldn't interfere.

  • far majority of the HTML pages are probably invalid.

In any case, perhaps I've misunderstood what you mean by a validator script. Is that for what dokieli consumers and/or generates?

I generally like the idea of running through a canonical functions that tells us what should go where and how. I like the example with aside. It makes me think of things like DO.C.DocumentItems where it has an order for certain common blocks, and dokieli looks that up when it needs to know where to insert an item. I think this sort of thinking is what you're raising, right? I think the patterns in dokieli are generally consistent, but perhaps this is where the templating stuff can help.

Perhaps the wording in "Ultimately, well-formed and valid HTML - along with accompanying RDFa, Turtle, JSON-LD, TriG etc - is the only requirement here" is not accurate. Instead of "requirement", I think I meant "goal" or "aim". That paragraph needs a rewrite.

csarven avatar Jun 26 '19 17:06 csarven

Well, every unexpected behaviour of dokeli.js could either a bug or a requirement to the document that's being processed. Maybe don't call it validator but linter to check for common pitfalls (such as not using section tags), and suspicious pieces of HTML+RDFa.

DO.C.DocumentItems where it has an order for certain common blocks, and dokieli looks that up when it needs to know where to insert an item. I think this sort of thinking is what you're raising, right?

yes, but I'd also catch invalid HTML and invalid or unrecognizable RDF (such as typos in namespace prefixes, up to undefined RDF properties). There are standard tools to do so (e.g. https://validator.github.io/validator/) but authors should not be required to find how to find, setup, and run these tools.

nichtich avatar Jun 26 '19 19:06 nichtich

linter to check for common pitfalls

catch invalid HTML and invalid or unrecognizable RDF

Anything that's created through the dokieli UI is intended to be valid.

The Save operation for example takes what's in the DOM and normalises an HTML before saving. Save As on the other hand uses source HTML as is.

What should change? I'm not against a linter, but trying to understand if and where it can fit in. I'm missing the "why" I think.

csarven avatar Jun 27 '19 08:06 csarven