go-toolkit icon indicating copy to clipboard operation
go-toolkit copied to clipboard

Support for parsing metadata using the `a11y` prefix

Open HadrienGardeur opened this issue 1 year ago • 12 comments

While casually comparing the output of the Go Toolkit with an OPF from Standard Ebooks, I noticed that the metadata using the a11y prefix do not seem to be properly parsed:

In this specific case, it's a11y:certifiedBy which doesn't seem to be properly parsed: https://github.com/standardebooks/nathaniel-hawthorne_the-house-of-the-seven-gables/blob/master/src/epub/content.opf#L19

Since we're currently working on supporting all key accessibility metadata for EAA, this is in scope with our work.

HadrienGardeur avatar Feb 08 '25 09:02 HadrienGardeur

I thought it could be because the a11y prefix is not declared in the OPF but that's the case as well in our test cases which cover the a11y prefixes. Maybe the refines?

mickael-menu avatar Feb 08 '25 11:02 mickael-menu

I thought about that too when I first encountered this example, but the spec is pretty clear that there's no need to declare the prefix.

HadrienGardeur avatar Feb 08 '25 17:02 HadrienGardeur

whatch out for things like a11y:pageBreakSource vs pageBreakSource, there were inconsistent specification examples:

https://www.w3.org/publishing/a11y/page-source-id/#examples

https://www.w3.org/TR/epub-a11y-tech-11/#pageSource

(should not have the prefix, I believe it's fixed now)

danielweck avatar Feb 08 '25 17:02 danielweck

yeah https://github.com/w3c/epub-specs/commit/444a3e661611a550303a1ec9d4f1d80dfe451750

danielweck avatar Feb 08 '25 17:02 danielweck

@HadrienGardeur The reason is because the parser (which closely follows the kotlin-toolkit at the time of implementation) parses this a11y:certifiedBy tag in the OPF you provided and adds it internally as a "child" of the conformance-statement tag, because it has refines="#conformance-statement" set. I could fix this by checking the children as per the screenshot below (green part): Image Is this however a potential issue with all the other accessibility tags as well, not just certifiedBy? If so, some more significant changes might be necessary. Relevant XML for everyone:

<meta id="conformance-statement" property="dcterms:conformsTo">EPUB Accessibility 1.1 - WCAG 2.2 Level AA</meta>
<meta property="a11y:certifiedBy" refines="#conformance-statement">Standard Ebooks</meta>

chocolatkey avatar Feb 10 '25 08:02 chocolatkey

Thank you for the explanation @chocolatkey.

This could indeed be an issue with other metadata that are natively supported in RWPM (vs URI based extensions) and we should be aware of it.

In this specific case:

  • we support multiple values for conformance (conformsTo)
  • but we don't link these values to certification (the certification object stands on its own in accessibility, it has no relationship whatsoever to conformsTo)
  • and we only support a single object in certification, we don't allow for multiple values/objects

This means that we should IMO:

  • use the first value that either stands on its own in the OPF or refines an accessibility conformance statement
  • and use our built-in extensibility (URLs) for everything else

HadrienGardeur avatar Feb 10 '25 08:02 HadrienGardeur

@HadrienGardeur So should this logic of checking the children of conformsTo apply just to the certifiedBy property, or others as well?

chocolatkey avatar Feb 10 '25 20:02 chocolatkey

How do we currently parse metadata using a refine statement? Do you keep the relationship between both metadata somehow?

HadrienGardeur avatar Feb 10 '25 22:02 HadrienGardeur

@HadrienGardeur If I understand what you're saying correctly, the answer is yes. Any tags that refine another tag are "children" of that tag. That's why one quick/potentially naiive fix is to check the children of the conformsTo tag

chocolatkey avatar Feb 13 '25 04:02 chocolatkey

How do we deal with that in the RWPM?

Do we use something like that?

"parent": {
  "value": "123",
  "child": "456"
}

HadrienGardeur avatar Feb 13 '25 08:02 HadrienGardeur

With unknown metadata, it looks like this:

<package prefix="myPrefix: http://my.url/#">
  <metadata>
    <meta id="customProperty" property="myPrefix:customProperty">Custom property</meta>
    <meta refines="#customProperty" property="myPrefix:refine1">Refine 1</meta>
    <meta refines="#customProperty" property="myPrefix:refine2">Refine 2</meta>
{
    "metadata": {
        "http://my.url/#customProperty": {
            "@value": "Custom property",
            "http://my.url/#refine1": "Refine 1",
            "http://my.url/#refine2": "Refine 2"
        }
    }
}

mickael-menu avatar Feb 13 '25 09:02 mickael-menu

Thanks @mickael-menu, it's been a while so it helps to refresh my memory.

Overall, this means that our support for extensibility should already work as expected, the problem seems to be with "native" properties for RWPM, where we might skip such refinements.

For all "native" properties, we need to make sure that they're parsed properly and I think that we can live with the lack of an equivalent of the refine statements.

Can you focus on these a11y properties for now @chocolatkey ?

We can also file a separate issue somewhere (architecture since it affects all toolkits?) to discuss how we should handle refine statements on our "native" properties. With properties that use an object representation, it should be straightforward, with strings, integers/numbers and booleans, less so.

HadrienGardeur avatar Feb 13 '25 10:02 HadrienGardeur