Allow for UTF-8 field values in header regular expression
Use [:print:] in the header regex and note that for ASCII it is equivalent to [ -~] and that the aim is to forbid control characters. Fixes #719.
To be honest, I'm tempted to add the extra [] to the \cclass definition and waste a bit of space each time this appears rather than add the “For brevity” sentence.
An alternative to this PR might be to just leave the regex as [ -~] and add a footnote explaining that this is an oversimplification for fields that allow Unicode values.
I'm unsure on the extra brackets also. My inclination though is it's probably not worth the hassle of inventing our own syntax and just going with the official double bracket style.
I'm guessing the extra brackets however were to permit things like [:[:alnum:]] being [:A-Za-z0-9] without ambiguity.
~~Note that UTS #18: Annex C suggests :print: character class compatibility for Unicode as \p{graph}\p{blank}--\p{cntrl}, which is likely not the appropriate definition here since it includes :blank:. It may be better to note a property matcher instead, e.g., \P{Other}.~~
Which specific :blank: characters are you worried about inadvertently including? (The obvious worry is TAB, but presumably that is removed by \p{cntrl}.)
Ah, yes, you're right. I had a set operation wrong when I tested with an example. :print: seems sufficient for this change.
I see I was assigned this in the last meeting.
Personally my preference is [[:print:]] over [:print:] as it's a standard and ironically the extra couple of characters we add a few times is less text than the "for brevity" statement. Not a hard "must be so" line, but if in agreement I'd prefer that before merging. Otherwise I'm happy with it.
Can I get clarity on who is progressing this please? I was assigned, but gave my feedback over a year ago and it's unchanged. As far as I'm concerned, the ball is back with @jmarshall , but if you wish me to just make an editorial decision then I will amend the [:print:] to [[:print:]] and merge.
The ball was indeed with me. See the new preview for how this uglifies [[:rname:^*=]][[:rname:]]*…