schema
schema copied to clipboard
strip-periods on cs:text macro="x", cs:name-part, cs:et-al
The issues are summarized here:
http://forums.zotero.org/discussion/15166/stripperiods-not-working-yet/#Comment_95398
The essentials of the proposal are:
- To not permit strip-periods on cs:text when it calls a macro;
- To permit strip-periods on cs:name-part; and
- To permit strip-periods on cs:et-al (see followup post below).
The first of these changes would affect some existing styles, making some previously valid CSL invalid. On the other hand, strip-periods on cs:text that calls a macro is likely to break in unpredictable ways, and should probably be nipped in the bud.
[I should note the view of the OP in the linked thread above, which is that strip-periods should just replace periods with an empty string. This strikes me now as a perfectly sensible thing to do, but I have a nagging feeling that the current behavior -- to preserve a space between "J" and "R" in "J.R. Ewing" after conversion -- was driven by user request, and will cause problems if withdrawn. I'm kind of in a mood to be told what to do on this one, so I can avoid responsibility. :) ]
Frank
On the behavior of strip-periods, I've made a change to the processor, accepting the suggestion that strip-periods should only delete periods from the target string, and not add space as a placeholder. The specification doesn't call for adding space, and on the plain meaning of the term, one wouldn't expect it to. The impact in the test suite is actually very small, and the change will make previously bad output correct (for journal abbreviations) in the one style known to be affected. So that part of this kerfuffle can be set aside.
The issue of where to allow strip-periods is still a live one. In addition to closing it out on cs:text with macro="xxx" and allowing it on cs:name-part, adamsmith has asked that it also be allowed on cs:et-al.
So the use case for strip-periods on cs:name-part is to normalize "Jane M. Doe" and "John N Doe" to "Jane M Doe" and "John N Doe". Then we still have no way for the other way around (to turn "Jane M. Doe" and "John N Doe" into "Jane M. Doe" and "John N. Doe").
And I'm not sure I like allowing strip-periods on cs:et-al. "et al." is an abbreviation. If a style deviates from the normal behavior of adding a period, they should do it consistently in the citations and bibliography.
On the general strip-periods issue, the problem is that currently, strip-periods can be applied on cs:text with macro="xxx". That covers this use case, but it's risky: if the output formatter (which is pluggable in our implementations so far) wraps some part of the macro content in a link (containing a domain name) for, say, RDFa support, it's not obvious how to prevent the strip-periods function from clobbering the periods in the domain name. The proposal to restrict its use on cs:text where macro="xx" is aimed at heading off trouble there down the road.
While it's true that Jane M. Doe / Jane M Doe round-tripping is not covered, conversion between those two is similar to text case vs setence case, in that it's much easier to convert reliably in one direction (removing periods) than the other (figuring out automagically where periods should be inserted). If I remember right, that's what motivated the introduction of strip-periods (and the refactoring of the locales) in CSL 1.0 in the first place.
In any case, if strip-periods is banned on cs:text with macro="xx" (which I think it should be, for the reasons given above), it should be allowed on cs:name-part, since we already have use cases for stripping periods from names out in the wild.
On allowing strip-periods on et-al, it's a term like any other, and if only for the sake of consistency, I wouldn't think there would be a problem with allowing strip-periods on the cs:et-al node in order to reach it. On the other hand, the availability of the "and others" alternative offers some flexibility, so denying it is unlikely to cause serious problems; if anyone needs different punctuation in citation and in bib, they can just route around the limitation.
Right. I'm wondering if it makes sense to allow strip-periods on cs:name-part and cs:et-al in CSL 1.0.1, but only forbid it on cs:text calling macros once we hit 1.1, as the latter is a backwards-incompatible change. In the meantime, we can make a note in the spec and CSL processors can just ignore it.
I like that idea.
+1
We've run into a problem with strip-periods on macro calls in the McGill Guide style, serious enough that I've worked out a solution that will cope with the specification as it currently stands. Once I have the code in place, I'll be more sanguine about leaving things as they are.
Given that the change would be backward incompatible, I won't push for it myself: but if it will make life simpler for other implementers, I wouldn't object either.
A patch for allowing strip-periods on cs:name-part is available for prereview here.
Do we still need "strip-periods" on cs:name-part if we introduce the "initialize" inheritable name option? That seems like the cleaner solution. (see #60)
We don't!
I'm also inclined to keep strip-periods off cs:et-al for now. The use case adamsmith brings up in the Zotero forum thread linked to above doesn't make sense: using "et al." (with period) in cites and "et al" (without) in bibliographic entries. "Et al." is an abbreviation, and if a style chooses to forgo the period, then at least it should be consistent. Furthermore, there is a workaround (as indicated in the thread).
Fine with me.
I'll remove the branch, and open one for "initialize". The only remaining question for this ticket is whether to allow strip-periods on cs:text with macro="XXX". If it is not to be banned, the ticket can be closed.
I think we should (as discussed above):
- for CSL 1.0.1, have the CSL processor ignore "strip-periods" on cs:text calling a macro and manually touch up the affected styles to make sure we can do without
- for CSL 1.1, change the schema to really disallow this use of strip-periods
We do need it, if neither o"initialize" nor "strip-periods" are available on cs:name-part. As I wrote above:
We've run into a problem with strip-periods on macro calls in the McGill Guide style, serious enough that I've worked out a solution that will cope with the specification as it currently stands. Once I have the code in place, I'll be more sanguine about leaving things as they are.
I have strip-periods working on cs:text with macro="X". We could do without it if "initialize" is introduced, but until then we still need it, to cover McGill (and the New Zealand Legal style, which is also allergic to periods).
Yes, I should have made explicit that my solution above is dependent on #60 being given the green light.
Above, I wrote:
if the output formatter (which is pluggable in our implementations so far) wraps some part of the macro content in a link (containing a domain name) for, say, RDFa support, it's not obvious how to prevent the strip-periods function from clobbering the periods in the domain name.
With more careful thought, I've managed to solve this rather simply in citeproc-js, by just stripping periods from input strings before they are fed to the output queue (the internal representation from which formatted output is produced). The stripping itself is that simple; nothing further is required. The only wrinkle is that the value of the strip-periods option needs to be tracked stackwise during the first phase of output (i.e. when writing to the internal representation), so that it takes effect only within the macro on which it has been set.
We should probably hear back from other implementers before making a move, but I think this should be easy enough to handle. It certainly is convenient in CSL code to be able to add strip-periods to an existing macro, so on balance I would be in favor of overturning my original suggestion, and allowing <text macro="x" strip-periods="true"/>.