seminars
seminars copied to clipboard
Automated help with capitalization
Our instructions that series names and talk titles should only capitalize the first word and proper nouns is currently followed about 50 percent of the time (if you look at the browse page and see evidence to the contrary that is simply because I just edited a whole bunch of them).
There are now too many talks being added for us to consistently enforce our house style by hand. I'm also somewhat sympathetic to organizers who in most cases are simply cut-and-pasting titles from another page that is using a different convention for capitalization, not typing in the titles.
I think we could greatly reduce the number of improperly capitalized titles if we had a dictionary of common words that appear in titles which we can be fairly confident are never going to be part of a proper noun (e.g. "problem", "question", "proof", "group", "field", "geometry", ....) and automatically switched them to lower case if they are not the first word. We should warn the user when this has been done and this would hopefully alert them to fix any words we miss. This wouldn't completely solve the problem, but it might reduce it to one we can actually deal with by hand.
We could also just alert them if a high percentage of the words is capitalized. We can tune up the threshold by looking at the current database. Obviously we can make this much more fancy, but transparency also might help, and this would generalize across fields.
On Fri, May 1, 2020, 07:43 Andrew Sutherland [email protected] wrote:
Our instructions that series names and talk titles should only capitalize the first word and proper nouns is currently followed about 50 percent of the time (if you look at the browse page and see evidence to the contrary that is simply because I just edited a whole bunch of them).
There are now too many talks being added for us to consistently enforce our house style by hand. I'm also somewhat sympathetic to organizers who in most cases are simply cut-and-pasting titles from another page that is using a different convention for capitalization, not typing in the titles.
I think we could greatly reduce the number of improperly capitalized titles if we had a dictionary of common words that appear in titles which we can be fairly confident are never going to be part of a proper noun (e.g. "problem", "question", "proof", "group", "field", "geometry", ....) and automatically switched them to lower case if they are not the first word. We should warn the user when this has been done and this would hopefully alert them to fix any words we miss. This wouldn't completely solve the problem, but it might reduce it to one we can actually deal with by hand.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/roed314/seminars/issues/331, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACO2BQZZFWOZFZYLUK3ZP3RPKYWBANCNFSM4MXBMYEA .
We might be able to use a spell checker to help with this -- highlight capitalized words that are not the first word that the spell checker knows and auto-suggest making it lower case.
I think we need to find a way to make it painless for organizers to cut-and-paste titles, and we should expect that half of them are going to have capitalization issues (this would be true whether we use sentence case or title case). If there is a way to automatically adjust most of the titles (and maybe just suggest the automated changes) it would be worth a lot.
The irony is that if we used title case rather than sentence case it would be easy to automate capitalization (just capitalize everything except a short list of excluded words like "a" and "the").
Spell checking depends on the language, and it is really hard to figure out proper nouns.
On Fri, May 1, 2020 at 8:30 AM Andrew Sutherland [email protected] wrote:
We might be able to use a spell checker to help with this -- highlight capitalized words that are not the first word that the spell checker knows and auto-suggest making it lower case.
I think we need to find a way to make it painless for organizers to cut-and-paste titles, and we should expect that half of them are going to have capitalization issues (this would be true whether we use sentence case or title case). If there is a way to automatically adjust most of the titles (and maybe just suggest the automated changes) it would be worth a lot.
The irony is that if we used title case rather than sentence case it would be easy to automate capitalization (just capitalize everything except a short list of excluded words like "a" and "the").
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/roed314/seminars/issues/331#issuecomment-622368875, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACO2BWTPLQKSQB6LUBWKZTRPK6GVANCNFSM4MXBMYEA .
I was actually only suggesting identifying obviously non-proper nouns, and only for titles in English.
But conversion to sentence case is a well-studied NLP problem (often called truecasing) and there are a couple of open source packages that attempt to solve it (and you can even train them on different languages). See
https://github.com/daltonfury42/truecase https://github.com/despawnerer/truecase
for example. Keep in mind that perfection is not the goal; any improvement over 50/50 would be welcome. Alternatively we could give up and switch to title case, which is easy to automate.
The first package looks quite good.
On Fri, May 1, 2020 at 9:00 AM Andrew Sutherland [email protected] wrote:
I was actually only suggesting identifying obviously non-proper nouns, and only for titles in English.
But conversion to sentence case is a well-studied NLP problem (often called truecasing) and there are a couple of open source packages that attempt to solve it (and you can even train them on different languages). See
https://github.com/daltonfury42/truecase https://github.com/despawnerer/truecase
for example. Keep in mind that perfection is not the goal; any improvement over 50/50 would be welcome. Alternatively we could give up and switch to title case, which is easy to automate.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/roed314/seminars/issues/331#issuecomment-622378157, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACO2BQDUFXEDZXN5WVL4PDRPLBU3ANCNFSM4MXBMYEA .
I'm wondering if we should switch to title case for series names (but keep sentence case of talks). Many of them have acronyms (e.g. "Waves in One World (WOW)") that look slightly silly in sentence case (e.g. "Waves in one world (WOW)"). It is also quite common to use title case for seminar/conference names and sentence case for talk titles (e.g. https://math.mit.edu/nt/stage.html). I think a pretty solid majority of organizers use title case for the seminar/conference name on their external web pages.
What do you think @poonen ?
I think we should stick to sentence case as the default, but not mandate it. Title case is more of an American thing, I think.