zotero-better-bibtex icon indicating copy to clipboard operation
zotero-better-bibtex copied to clipboard

Capitalized words after colons not brace protected

Open rzach opened this issue 3 years ago • 23 comments

A Zotero title

Simplex sigillum veri: Peano, Frege, and Peirce on the primitives of logic

is exported as

title = {Simplex Sigillum Veri: Peano, {{Frege}}, and {{Peirce}} on the Primitives of Logic}

The proper name "Peano" should be case protected. Obvs you can't tell if the word after a ":" is a proper name, but "Blah blah: Soandso on blah" is a very common title-subtitle pattern so I think the default should be to protect a capitalized word after a colon. The right-click sentence caser will lowercase words following colons, so if the title is correctly cased in Zotero it should be correct on export.

I don't know if it's a transient issue or another bug, but debug logging doesn't seem to get enabled when I click on the "restart Zotero with debugging enabled" button, and when I try to submit a debug log anyway it just hangs on the "Please wait while a support log is submitted". Hence, no support log ID, sorry.

Zotero version: 5.0.96.3

BBT version: 5.6.7

Support log ID:

Exporter used: Better BibLaTeX

Expected behavior: title = {Simplex Sigillum Veri: {{Peano}}, {{Frege}}, and {{Peirce}} on the Primitives of Logic}

Actual behavior: title = {Simplex Sigillum Veri: Peano, {{Frege}}, and {{Peirce}} on the Primitives of Logic}

rzach avatar Nov 06 '21 16:11 rzach

Sorry about the missing log ID but it doesn't want to submit a BBT log. The Zotero log says this when I try:

(3)(+0031685): {better-bibtex error} +106819 failed to submit undefined <Error: TypeError: this.errorlog is undefined in chrome://zotero-better-bibtex/content/better-bibtex.js line undefined zip@chrome://zotero-better-bibtex/content/better-bibtex.js:95581:44 send@chrome://zotero-better-bibtex/content/better-bibtex.js:95536:23 anonymous@chrome://global/content/bindings/wizard.xml line 432 > Function:3:1 _fireEvent@chrome://global/content/bindings/wizard.xml:433:28 set_currentPage@chrome://global/content/bindings/wizard.xml:103:11 advance@chrome://global/content/bindings/wizard.xml:290:15 @chrome://global/content/bindings/wizard.xml:157:15 errorReport@chrome://zotero-better-bibtex/content/better-bibtex.js:88666:7 oncommand@chrome://zotero/content/standalone/standalone.xul:1:1 > 

Support log ID:

rzach avatar Nov 06 '21 16:11 rzach

Support log ID: EV75ST2T-refs-euc

(This is with an older version of BBT; newest version won't submit log)

rzach avatar Nov 06 '21 18:11 rzach

There isn't an earlier error in 5.6.7?

retorquere avatar Nov 06 '21 21:11 retorquere

It does, I posted the full log to #1979

rzach avatar Nov 06 '21 21:11 rzach

:robot: this is your friendly neighborhood build bot announcing test build 5.6.7.1778 ("less downloads from 0x0")

Install in Zotero by downloading test build 5.6.7.1778, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

github-actions[bot] avatar Nov 06 '21 21:11 github-actions[bot]

The build numbers are tied to the issue, it's easier for me if the logs are posted on the corresponding issue.

retorquere avatar Nov 06 '21 22:11 retorquere

A new build will drop here shortly

retorquere avatar Nov 06 '21 22:11 retorquere

:robot: this is your friendly neighborhood build bot announcing test build 5.6.7.1779 ("stray globals")

Install in Zotero by downloading test build 5.6.7.1779, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

github-actions[bot] avatar Nov 06 '21 22:11 github-actions[bot]

It works! Support log ID: 9JLKYJSM-refs-euc

rzach avatar Nov 06 '21 22:11 rzach

The proper name "Peano" should be case protected. Obvs you can't tell if the word after a ":" is a proper name, but "Blah blah: Soandso on blah" is a very common title-subtitle pattern so I think the default should be to protect a capitalized word after a colon.

I can't really judge how common this is; wouldn't Bla blah: A treatise on... be equally likely?

The general Zotero facility for something to be protected from case-meddling (which you want for names, but also words like "iPod") is to enclose it in <span class="nocase">...</span> (I think that <nc>...</nc> does the same).

The right-click sentence caser will lowercase words following colons, so if the title is correctly cased in Zotero it should be correct on export.

The right-click title-caser lowercases almost everything. It's not very smart.

retorquere avatar Nov 06 '21 23:11 retorquere

I guess what I do when I add something to Zotero is to sentence case it and fix any proper names. If BBT treats capitalized things differently depending on whether they follow a colon or not I'll have to remember to type <span class="nocase">...</span> every time. I mean it's not super inconvenient but it is inconsistent, and what I love about BBT is that it's consistent and lets me leave almost everything in Zotero the way it comes from the publisher metadata. Since BBT title-cases things on export anyway, the only case where you don't want to have the capital after : protected would be if you needed to force it to become lowercase when Bib(la)TeX runs over it. That seems to me to be the less likely case. And it can be accomplished using Bla blah: <span class="nocase">a</span> treatise on..., right?

rzach avatar Nov 06 '21 23:11 rzach

Since BBT title-cases things on export anyway, the only case where you don't want to have the capital after : protected would be if you needed to force it to become lowercase when Bib(la)TeX runs over it. That seems to me to be the less likely case. And it can be accomplished using Bla blah: <span class="nocase">a</span> treatise on..., right?

No, because that would always force it to be {{a}}, with no recourse for you to change that behavior. There can be styles that want to change case on that word; I can only case-protect-by-default if I'm sure it's a proper noun that you never want case-conversion on.

That BBT takes matters in its own hands for mid-sentence words that have capitals is actually a bit iffy, but at least I've not yet found an exception to that rule. Consistency can come in many forms, in this case the consistency is "don't touch words if you're not sure they're proper nouns". The non-iffy behavior would be to only do explicit case conversions, but that gets annoying fast. TBH I think that default-sentencecase (as Zotero and other CSL processors do it) is much better choice than titlecase-default. Getting stuff into title-case is one of the more complicated things in BBT, even though I'm re-using the title-caser from Zotero, just augmenting its behavior.

I'm not arguing for the sake of arguing BTW, I think you make a valid case, it's just that I have a lot of stuff to take into account. A lot of this stuff I had no idea were coming my way; it's reports like yours that drive the development of BBT

retorquere avatar Nov 07 '21 09:11 retorquere

Ok I think I understand the problem a little better. Some citation styles want the first word of the subtitle (after the :) to always be capitalized and some don't. BBT wants a nice way to decide which words to protect from capitalization changes. That is to protect all the words that are capitalized in Zotero, title case everything else, then let Bib(La)TeX sort out the capitalization of unprotected words. The current way, however, does not protect words after a :. I'm assuming that's because if you capitalize a subtitle in Zotero, the first word will be protected by BBT on export, and that will prevent BibTeX from lowercasing it if the style demands lowercase subtitles. So protecting the A in Bla blah: A treatise on... would not work if your style demands a lowercase a. Right?

I guess what I don't quite get is what would break if you did it uniformly (ie switch back to protecting capitalized words after a :), as long as your Zotero database follows the prescription to sentence case titles except proper names (i.e., as long as you have Bla blah: a treatise on... in your Zotero database, which is what you recommend. It seems to work in Zotero's own bibliography generator, e.g., Vancouver will have sentence case with lowercase a and APA will have sentence case with uppercase A.

Another thing to consider might be that : is not the only way to separate titles from subtitles (eg, periods and dashes are also used). So BBT would treat some subtitles one way and others a different way, also inconveniently inconsistent.

(I'm also not arguing for argument's sake: I guess I can live with either way; and if it stays the way it is my only request would be to add this to the FAQ at https://retorque.re/zotero-better-bibtex/support/faq/

Thank you btw for this wonderful tool, it was what finally allowed me to switch from keeping my BIB files in plain text format and start using Zotero. I think I have not had to use any <span class="nocase"> at all so far; so I was a bit disappointed that I would now have to go through and fix all proper names after colons. )

rzach avatar Nov 08 '21 18:11 rzach

Some citation styles want the first word of the subtitle (after the :) to always be capitalized and some don't. BBT wants a nice way to decide which words to protect from capitalization changes.

Doesn't need to be nice, but since it's algorithmic, it needs to be consistent, and err on the side of caution. In this case it's not that BBT decides which words to protect, as the baseline behavior is that it just follows the nocase declarations, it's more that there is a limited set of circumstances where BBT can safely override the meaning of what the user entered (being "don't case-protect any of this"); for inner-title words, it's been safe (so far) to infer it. But there is no way for the user to override that inference, so I can only do it where I am certain.

That is to protect all the words that are capitalized in Zotero

This is actually the exception to the rule, not the rule. The rule is to follow the explicit caps-protection as entered by the user. Caps-protection is just rarely needed for Zotero itself, as it expects sentence case input, so the only protection you really need if you use Zotero with Word/LibreOffice/GDocs is for words that you want to remain lowercase; you wouldn't generally protect proper nouns if you used Zotero only with Word etc. Between these two opposing uses of the caps protection, I tried to come up with something that helps the user where it can, but still leave the user in control in ambiguous situations.

The current way, however, does not protect words after a :, I'm assuming that's because if you capitalize a subtitle in Zotero, the first word will be protected by BBT on export, and that will prevent BibTeX from lowercasing it if the style demands lowercase subtitles. So protecting the A in Bla blah: A treatise on... would not work if your style demands a lowercase a. Right?

Yes, correct.

I guess what I don't quite get is what would break if you did it uniformly (ie switch back to protecting capitalized words after a :)

This has never (intentionally) been BBTs behavior.

as long as your Zotero database follows the prescription to sentence case titles except proper names (i.e., as long as you have Bla blah: a treatise on...

That's not how sentence case is supposed to be as far as I know: https://apastyle.apa.org/style-grammar-guidelines/capitalization/sentence-case. And (to my knowledge) Zotero simply passes the input out unchanged for sentence-cased CSL styles, and in that case, APA at least would render incorrectly if you wrote : a treatise.

Another thing to consider might be that : is not the only way to separate titles from subtitles (eg, periods and dashes are also used).

Yep, and those will trigger the same behavior.

So BBT would treat some subtitles one way and others a different way, also inconveniently inconsistent.

In which case I would fix that inconsistency where I can.

(I'm also not arguing for argument's sake: I guess I can live with either way; and if it stays the way it is my only request would be to add this to the FAQ at https://retorque.re/zotero-better-bibtex/support/faq/

That seems reasonable. The FAQ is user-editable BTW; I can add it of course, but I'm sort of blind to the best way to phrase these things, since I know BBT too well.

Thank you btw for this wonderful tool, it was what finally allowed me to switch from keeping my BIB files in plain text format and start using Zotero. I think I have not had to use any <span class="nocase"> at all so far; so I was a bit disappointed that I would now have to go through and fix all proper names after colons. )

If you want to automate that, a postscript will do it:

if (Translator.BetterTeX && reference.has.title) {
  reference.add({ name: 'title', value: reference.has.title.value.replace(/(^|(:\s*))([A-Z][^\s]*)/g, '$1<span class="nocase">$3</span>') })
}

But that will then also catch capitalized leading non-nouns as discussed above. Whether that is ever a problem for you depends on the style(s) you use.

retorquere avatar Nov 08 '21 21:11 retorquere

Just one thing: An article keyed into Zotero as Editors' introduction: mathematical methods in philosophy (which is what Zotero will produce if you right-click and select sentence case on the title) gives the following in APA (capitalize in sentence case after colons):

Antonelli, A., Urquhart, A., & Zach, R. (2008). Editors’ introduction: Mathematical methods in philosophy. Review of Symbolic Logic, 1(2), 143–145. https://doi.org/10.1017/S1755020308080131

and in Vancouver (which wants lowercase after colon):

  1. Antonelli A, Urquhart A, Zach R. Editors’ introduction: mathematical methods in philosophy. Review of Symbolic Logic. 2008;1(2):143–5.

So Zotero with Word will give you the right results if you lowercase words after the colon. That's what I thought Zotero wants you to do.

On the other hand, if you key it in as Editors' introduction: Mathematical methods in philosophy in Zotero and then generate using Vancouver style you get the incorrect:

  1. Antonelli A, Urquhart A, Zach R. Editors’ introduction: Mathematical methods in philosophy. Review of Symbolic Logic. 2008;1(2):143–5.

So as far as I can tell, Zotero wants you to lowercase after a colon (unless it is is a proper noun), so case protecting a capitalized word after a colon would only result in an incorrectly formatted field on the bib(la)tex side if you first incorrectly capitalized it in Zotero.

The BBT behavior after dashes is not the same right now as after colons. It case protects capitals after dashes (but not after colons): Editors' introduction—Mathematical methods in philosophy in Zotero gives me title = {Editors' Introduction\textemdash{{Mathematical}} Methods in Philosophy}, in the exported bib file. Same with a period, or space-hyphen-space.

rzach avatar Nov 08 '21 22:11 rzach

Just one thing: An article keyed into Zotero as Editors' introduction: mathematical methods in philosophy (which is what Zotero will produce if you right-click and select sentence case on the title)

The Zotero sentence caser isn't great, so I don't take its behavior to be authoritative on anything. Try sentence-casing

A Piece on <span class="nocase">Kant</span>

So as far as I can tell, Zotero wants you to lowercase after a colon (unless it is is a proper noun), so case protecting a capitalized word after a colon would only result in an incorrectly formatted field on the bib(la)tex side if you first incorrectly capitalized it in Zotero.

Fair point -- I'll check in with the citeproc people. If this is correct, and it looks to be, I'll change the behavior of BBT.

The BBT behavior after dashes is not the same right now as after colons. It case protects capitals after dashes (but not after colons): Editors' introduction—Mathematical methods in philosophy in Zotero gives me title = {Editors' Introduction\textemdash{{Mathematical}} Methods in Philosophy}, in the exported bib file. Same with a period, or space-hyphen-space.

That's a bug. I'll add the em-dash.

retorquere avatar Nov 08 '21 22:11 retorquere

BTW are you sure that this didn't recently change? I have exported bib files from June and earlier that definitely protect uppercase letters after colons.

rzach avatar Nov 08 '21 23:11 rzach

If that happened it wasn't intentional. I reuse and augment the citeproc (which is what zotero uses) title caser, it may have changed its behavior, but it would be a surprise to me if that hadn't been covered by my test suite.

I've put out a request for clarification on the zotero-dev list, but your test seems conclusive to me. It is however a change in behavior, so I want to give it a say or so to get a response from the dev list before I make the change.

retorquere avatar Nov 08 '21 23:11 retorquere

Well, that was fast. Zotero-dev confirms your analysis, tests are running on the BBT change.

retorquere avatar Nov 09 '21 01:11 retorquere

:robot: this is your friendly neighborhood build bot announcing test build 5.6.8.1803 ("adjust tests")

Install in Zotero by downloading test build 5.6.8.1803, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

github-actions[bot] avatar Nov 10 '21 14:11 github-actions[bot]

:robot: this is your friendly neighborhood build bot announcing test build 5.6.8.1805 ("test case for #1978")

Install in Zotero by downloading test build 5.6.8.1805, opening the Zotero "Tools" menu, selecting "Add-ons", open the gear menu in the top right, and select "Install Add-on From File...".

github-actions[bot] avatar Nov 10 '21 22:11 github-actions[bot]

Does build 1805 work as you expect?

retorquere avatar Nov 16 '21 09:11 retorquere

Just tested; yes it does. Thanks!

rzach avatar Nov 16 '21 16:11 rzach