bibliography
bibliography copied to clipboard
Empirical solution to name representation
The command:
grep -h '^(author ' page*.scm | sort | uniq | sed -e 's/^(author //' -e 's/)$//' -e 's/"//g' | grep -v others
gives all the names we have so far:
Adams, Norman
Anderson, Claude W
Anderson, Kenneth R
Ashley, J Michael
Baker, Henry G
Bartlett, Joel F
Bartley, David H
Barzilay, Eli
Bawden, Alan
Başar, R Emre
Benson Jr, Brent W
Bothner, Per
Boucher, Dominique
Boussinot, Frédéric
Bres, Yannis
Bruggeman, Carl
Carlstrom, Brian David
Cejtin, Henry
Chen, Pee-Hong
Ciabrini, Damien
Clements, John
Clinger, Will
Clinger, William D
Clinger, William
Cowley, Anthony
Danvy, Olivier
De Roure, David
DePristo, Mark
DeRoure, David
Derici, Caner
Desbien, Jocelyn
Dionne, Carl
Duba, Bruce F
Dwyer, Rex A
Dybvig, R Kent
Earl, Christopher
Epardaud, Stéphane
Farmer, William M
Feeley, Marc
Felleisen, Matthias
Findler, Robert Bruce
Flanagan, Cormac
Flatt, Matthew
Forin, Alessandro
Foster, Ian
Friedman, Daniel P
Friedman, Daniel P.
Fuchs, Matthew
Gasbichler, Martin
Germain, Guillaume
Ghuloum, Abdulaziz
Grossman, Dan
Guttman, Joshua D
Halstead Jr, Robert H
Hansen, Lars Thomas
Hanson, Chris
Hartheimer, Anne
Haynes, Christopher T
Haynes, Christopher T.
Hickey, Timothy J
Hieb, Robert
Hilsdale, Erik
Hudak, Paul
Jagannathan, Suresh
Jensen, John C
Katz, Morry
Keep, Andrew W
Kelsey, Richard A
Kelsey, Richard
Kimball, Aaron
Kranz, David A
Kranz, David
Krishnamurthi, Shriram
Lang, Kevin J
Lapalme, Guy
Loaiza, Juan R
Loitsch, Florian
Marshall, Joe
Masuhara, Hidehiko
McDermott, Drew
Meunier, Philippe
Might, Matthew
Miller, James S
Miller, James
Miller, Scott G
Mirani, Rajiv
Mohr, Eric
Monk, Leonard G
Monnier, Stefan
Moreau, Luc
Muller, Hans
Nagata, Akihito
Norvig, Peter
Oliva, Dino P
Ost, Eric
Pearlmutter, Barak A
Pettyjohn, Greg
Philbin, James
Philbin, Jim
Piquer, José
Piérard, Adrien
Pleban, Uwe F
Pleban, Uwe F.
Prabhu, Tarun
PreScheme, Multithreaded
Queinnec, Christian
Ramsdell, John D
Rees, Jonathan A
Rees, Jonathan
Ribbens, Daniel
Rose, John R
Rozas, Guillermo J
Rozas, Guillermo
Sabry, Amr Afaf
Sabry, Amr
Sarkar, Dipanwita
Schooler, Richard
Schultz, Ulrik P
Schultz, Ulrik Pagh
Serpette, Bernard P
Serpette, Bernard Paul
Serpette, Bernard
Serrano, Manuel
Shivers, Olin
Sperber, Michael
Stamos, James W
Steele Jr, Guy L
Steele Jr, Guy Lewis
Steele, Guy L
Sumii, Eijiro
Sussman, Gerald Jay
Swarup, Vipin
Tammet, Tanel
Taura, Kenjiro
Taylor, CJ
Teodosiu, Dan
Thanos, Dimitri
Thiemann, Peter
Tinker, Pete
Turcotte, Marcel
Van Horn, David
Vegdahl, Steven R
Vitek, Jan
Waddell, Oscar
Wand, Mitchell
Weeks, Stephen
Weis, Pierre
Weise, Daniel
Wilson, Jason
Wittenberger, J
Yonezawa, Akinori
Şenol, Çağdaş
Seesm that most of these are easy European-style names.
These have Jr:
Benson Jr, Brent W
Halstead Jr, Robert H
Steele Jr, Guy L
Steele Jr, Guy Lewis
These seem like Japanese names:
Sumii, Eijiro
Taura, Kenjiro
Some authors' names are spelled differently in different papers. I'm not sure whether we should preserve this.
Here's what CSL-JSON expects:
"definitions": {
"name-variable": {
"anyOf": [
{
"type": "object",
"properties": {
"family": {
"type": "string"
},
"given": {
"type": "string"
},
"dropping-particle": {
"type": "string"
},
"non-dropping-particle": {
"type": "string"
},
"suffix": {
"type": "string"
},
"comma-suffix": {
"type": ["string", "number", "boolean"]
},
"static-ordering": {
"type": ["string", "number", "boolean"]
},
"literal": {
"type": "string"
},
"parse-names": {
"type": ["string", "number", "boolean"]
}
},
"additionalProperties": false
}
]
},
And what CFF expects:
"person": {
"additionalProperties": false,
"description": "A person.",
"properties": {
...
"family-names": {
"description": "The person's family names.",
"minLength": 1,
"type": "string"
},
...
"given-names": {
"description": "The person's given names.",
"minLength": 1,
"type": "string"
},
"name-particle": {
"description": "The person's name particle, e.g., a nobiliary particle or a preposition meaning 'of' or 'from' (for example 'von' in 'Alexander von Humboldt').",
"examples": [
"von"
],
"minLength": 1,
"type": "string"
},
"name-suffix": {
"description": "The person's name-suffix, e.g. 'Jr.' for Sammy Davis Jr. or 'III' for Frank Edwin Wright III.",
"examples": [
"Jr.",
"III"
],
"minLength": 1,
"type": "string"
},
...
The CFF person record supports other interesting data (e.g. website) that is not strictly related to names.
@omasanori Would you like to try synthesizing from these schemas and the BibTeX format a name representation that works for the names listed above?
Yeah, I will try. Thank you so much for your survey, @lassik !
comma-suffix
Wow, CSL could distinguish "John Doe, Jr." and "John Doe Jr."
Some authors' names are spelled differently in different papers. I'm not sure whether we should preserve this.
It is probably fine to unify "Friedman, Daniel P" and "Friedman, Daniel P." into one "Daniel P. Friedman", for instance. The situation were awful if we had found "J. McCarthy" since that person could at least be John McCarthy or Jay A. McCarthy in the context of Lisp dialects. In general, if we are confident we can unify but otherwise we should keep as-is.
On (probably) Japanese names, I found five:
- Masuhara, Hidehiko
- Nagata, Akihito
- Sumii, Eijiro
- Taura, Kenjiro
- Yonezawa, Akinori
They all follow the Family, Given format so the sorting is okay.
I wonder how Van Horn, David is represented. Is "Van Horn" the surname?
How do these look:
"Benson Jr, Brent W"
(family "Benson")
(given "Brent" "W")
(suffix "Jr")
"Halstead Jr, Robert H"
(family "Halstead")
(given "Robert" "H")
(suffix "Jr")
"Steele Jr, Guy L"
(family "Steele")
(given "Guy" "L")
(suffix "Jr")
Van is... difficult. In some countries, Van shall be ignored as the sorting key, while in other countries Van shall be counted. Whether it is capitalized or not also depends on countries or languages (or usage).
Regarding David Van Horn, David always uses capitalized form and BibTeX normally counts capitalized token as part of surname, so, I guess that it is not awfully bad to treat Van Horn as the surname.
And David does not spell "David V. Horn" so let's keep Van as-is. In most case, their own usages matter.
Yes, people are the best authority on their own names.
If the default sort key is the family name, then the following would suffice.
(family "Van Horn")
(given "David")
This means that "Horn" never makes sense without the "Van" prefix; the name is always filed under "Van Horn".
There is another schemer, Anton van Straaten, who has at least one paper in the bibliography (not yet converted to S-expression metadata). In his name, the "van" is in lowercase. So I don't know whether it's "Straaten, Anton van" or "van Straaten, Anton" (and in the latter case, it could be alphabetized under "v" or "s" - who knows.)
In BibTeX, the letter one is preferred, as van is a prefix of family name and the sorting ignores it (the von part in the BibTeX terminology) anyways.
In CSL terminology, that (ignored in sorting) van is a dropping-particle or a non-dropping-particle. Dropping is whether it should be dropped when family name is displayed alone in, ex. "For details, see [Name, 2023]" vs. "For details, see [van Name, 2023]".