Stylesheets icon indicating copy to clipboard operation
Stylesheets copied to clipboard

attribute value constraint and processing disagreement

Open sydb opened this issue 7 years ago • 11 comments

In creating schemas via ODD we use lots of attributes that are declared as teidata.name (e.g. @key and @ident). RELAX NG permits leading and trailing whitespace. However odds/odd2odd.xsl does not strip leading and trailing whitespace.

Thus

<memberOf key="att.typed"/>

is valid and works fine, but

<memberOf key=" att.typed "/>

is valid but silently fails: in the output schema the element being specified does not actually get the @type and @subtype attributes.

Seems to me we should either

  1. Shore up the schema so that it objects if there is whitespace in the value of one of these kinds of attriubtes, perhaps via an added Schematron rule.
  2. Test for spaces in the processing and issue a warning or error if they are present.
  3. Test to see if the <memberOf> (or <dataRef> or whatever) found something via the provided @key, and issue a warning if not found. (I haven’t thought this through at all, but I suspect would be the best way to go for @key but would not work at all for @ident.)
  4. Consistently use normalize-space() on these values in the processing.

I don’t have a preference at the moment. Also seems to me to be pretty low priority, as it has probably been this way for well over a decade and no one has complained.

sydb avatar Jan 05 '19 19:01 sydb

Option 4 plz.

Next.

reluctantly using Outlook for Androidhttps://aka.ms/ghei36


From: Syd Bauman [email protected] Sent: Saturday, January 5, 2019 7:40:32 PM To: TEIC/Stylesheets Cc: Subscribed Subject: [TEIC/Stylesheets] attribute value constraint and processing disagreement (#353)

In creating schemas via ODD we use lots of attributes that are declared as teidata.name (e.g. @key and @ident). RELAX NG permits leading and trailing whitespace. However odds/odd2odd.xsl does not strip leading and trailing whitespace.

Thus

is valid and works fine, but

is valid but silently fails: in the output schema the element being specified does not actually get the @type and @subtype attributes.

Seems to me we should either

  1. Shore up the schema so that it objects if there is whitespace in the value of one of these kinds of attriubtes, perhaps via an added Schematron rule.
  2. Test for spaces in the processing and issue a warning or error if they are present.
  3. Test to see if the <memberOf> (or <dataRef> or whatever) found something via the provided @key, and issue a warning if not found. (I haven’t thought this through at all, but I suspect would be the best way to go for @key but would not work at all for @ident.)
  4. Consistently use normalize-space() on these values in the processing.

I don’t have a preference at the moment. Also seems to me to be pretty low priority, as it has probably been this way for well over a decade and no one has complained.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/TEIC/Stylesheets/issues/353, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AAoU9Oa_Qb5myL6g0lFzNODvurFy8xPSks5vAP-wgaJpZM4ZyDXe.

lb42 avatar Jan 05 '19 20:01 lb42

All four options please. :-)

martindholmes avatar Jan 05 '19 20:01 martindholmes

Definitely 4

jamescummings avatar Jan 05 '19 20:01 jamescummings

OK. Just to make sure you realize what we’re getting into with (4), it would mean that we would have to change things like (e.g.) the @use of the first <xsl:key> in odds/odd2odd.xsl from "concat(../../@ident,'_',@ident)" to something like one of

  1. "concat( normalize-space(../../@ident),'_',normalize-space(@ident) )"
  2. "concat( ../../@ident/normalize-space(), '_', @ident/normalize-space() )"

And that we have to change every lookup into that key() to match.

(Note that "normalize-space( concat( ../../@ident, '_', @ident ) )" would be scary, because trailing whitespace of the grandparent @ident or leading whitespace of the @ident would become internal whitespace, and be changed to a single U+0020. I don’t know what happens if you use a string with a space as a key. Probably works, but …)

sydb avatar Jan 05 '19 23:01 sydb

Are you allowed to have multiple values? If so we should process with that in mind. If not then why not strip them with translate or something rather than normalise space? Not saying that is less scary.

jamescummings avatar Jan 05 '19 23:01 jamescummings

No, @jamescummings, the values I am talking about are singular. (There are others that are multiple, like moduleRef/@include.) And sure, there are other ways to do this.

sydb avatar Jan 06 '19 02:01 sydb

4

Teidata.name is defined as whitespace-free in the text, so this is just checking an existing constraint.

Cheers Stuart

On Sun, 6 Jan 2019 3:16 pm Syd Bauman <[email protected] wrote:

No, @jamescummings https://github.com/jamescummings, the values I am talking about are singular. (There are others that are multiple, like moduleRef/@include.) And sure, there are other ways to do this.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TEIC/Stylesheets/issues/353#issuecomment-451709172, or mute the thread https://github.com/notifications/unsubscribe-auth/AAAo5_Dd9Dxm8008SVfd1nAdDd8O7cV0ks5vAVyWgaJpZM4ZyDXe .

stuartyeates avatar Jan 06 '19 06:01 stuartyeates

Set to “pending” to verify that the original problem really is a problem, i.e. that " att.typed " is valid and does produce processing errors.

sydb avatar Jun 02 '20 14:06 sydb

Assigning myself to perform the verification described above.

sydb avatar Nov 03 '21 01:11 sydb

Verified. This little ODD demonstrates the problem. It is valid against tei_all and p5odds (and almost valid against tei_customization). But the added elements show up as

tst_353_one =
  element one {
    xsd:token { pattern = "[^\p{C}\p{Z}]+" },
    tst_353_att.global.attributes,
    tst_353_att.typed.attributes,
    empty
  }

tst_353_two =
  element two {
    xsd:token { pattern = "[^\p{C}\p{Z}]+" },
    tst_353_att.global.attributes,
    empty
  }

in the RNC. (Note the missing “tst_353_att.typed.attributes” for the definition of <tst:two>.)

sydb avatar Sep 09 '22 18:09 sydb

Since no one else has picked this up in the last 20 months, I have re-assigned it to myself. Sigh.

sydb avatar Jun 02 '25 03:06 sydb