corpus-joyce-ulysses-tei
corpus-joyce-ulysses-tei copied to clipboard
Loanwords / compound words / Joyceanisms
‘Proteus’ is an episode of edge cases. Today’s dilemma: loanwords.
As the cockle pickers pass Stephen on their way from the shore-line, he thinks to himself:
She trudges, schlepps, trains, drags, trascines her load. (U 3.392–93; my emphasis)
How would we encode this multilingual description in which translations for the verb ‘to drag’ from Yiddish / German (shlepn / schleppen), French (traîner), and Italian (trascinare) have been ‘Englished’ or anglicized† in Stephen’s interior monologue?
† OED has intr. to coin an English word by borrowing from another language (rare).
None of these non-standard words is italicized in the reading text so we’d put an @rend="none"
attribute on our tag. But what element does the Guidelines suggest for loanwords? <foreign>
is clearly out of place, since Stephen is borrowing from non-English languages into English (he applies English verb conjugation to his borrowed verb forms). Cf., in this vein:
Number one swung lourdily her midwife’s bag (U 3.32; my emphasis)
I’m sure this phenomenon is not limited to ‘Proteus’ or to Stephen’s interior monologue. If we start to encounter it all over the corpus, it might be worth marking up.
I really like this idea. Let's do it. The only thing I can think of would be something like <seg type="loanword" subtype="fr">
.
This seems related to the neologism/Joyceanism/compound word markup we're doing for Portrait in https://github.com/JonathanReeve/corpus-joyce-portrait-TEI/issues/36. For those, we're using <seg type="neologism">
, but I can imagine getting more descriptive, and using <seg type="neologism">
for true Joyceanisms that don't have clear etymologies, and <seg type="compound">
for compound words.
If that sounds good, we can change the title of this issue to cover loanwords, compound words, and Joyceanisms.
What about using something like distinct
?
distinct
identifies any word or phrase which is regarded as linguistically distinct, for example as archaic, technical, dialectal, non-preferred, etc., or as forming part of a sublanguage.
We could @type
this to include values like "loanword" "non-standard compound word" (or an abbreviation) "archaism" &c. Admittedly, implementing this tagging is likely a while off yet and a task that would require a dedicated crowd of encoders.
<distinct>
sounds great. Let's do it. I might know someone that might be interested in helping out with this--the maintainer of the Joyce Word Dictionary. Loanwords, compound words, dialectical words, and related words would be good to track. I agree, though, that this seems low-priority.
This is what I've been thinking: <distinct type="X">
, where X
is one of:
-
compound
: the word exists in the OED, but in its hyphenated form, except words whose only citations are from Joyce -
nonstandard-compound
: a compound word not found in the OED, even in hyphenated form, but composed of two words that are found in the OED -
dialect
: nonstandard dialect, slang, etc, whether or not it's found in the OED- for this, we can use
@space
to distinguish the dialect further, where it has a particular associated place to it (diatope in the TEI docs)
- for this, we can use
-
archaism
: archaisms generally out-of-use around the turn of the century- for these, we can use
@time
to specify the associated time
- for these, we can use
-
Joycean
: for obvious Joycean coinages, or other distinctive words that don't fit in the above categories, but seem to belong to no other obvious linguistic or lexicographical group, either.
Sounds great, Jonathan; I particularly like compound
and nonstandard-compound
as values. I think this system covers most of our cases. The only other instances that might be worth flagging are (1) Joyce’s extensions of the meaning of a pre-existing word – a subtype of @type=Joycean
perhaps? – and (2) his use of an obsolete or archaic sense of an otherwise still-current word (subtype of @type=archaism
perhaps?). I’m sure, too, that we’ll encounter combinations of the @type
values along the way: Elizabethanisms that survive in Hiberno-English, for example.
(1) Joyce extends ‘welsh comb’ [n. the thumb and four fingers] to include a verb form:
<p><lb n="070331"/>He [Simon Dedalus] took off his silk hat and, blowing out impatiently his bushy
<lb n="070332"/>moustache, welshcombed his hair with raking fingers.</p>
(2) Stephen imagines himself in a ‘medley’ drawing on the noun’s first sense in the OED:
A. n. I. The mixing or mingling of people in combat.
- Combat, conflict; fighting, esp. hand-to-hand fighting between two groups of combatants. Also: an instance of this; a war, battle; a tournament; a quarrel. Also fig. Cf. mellay n. 3, mêlée n. 1. Now rare (arch.).
<p><lb n="020314"/>Again: a goal. I am among them, among their battling bodies in a
<lb n="020315"/>medley, the joust of life.
Would it be an idea to bring Natasha into the conversation? Also, if we find we need to disambiguate your list further at some point down the road, well then, so be it.
This is fun as an intellectual exercise – he says, having started the thread – but it’s also more or less academic until we start the business of encoding.
In that vein, have we any way of filtering out all the non-<distinct>
words? Running the corpus through a few different spell checkers might reduce the total lexicon down to something more manageable, for example. Or could we cross reference the lexicon with the headwords in P. W. Joyce’s English as we speak it in Ireland in order to single out the Hiberno-English? Has anyone looked at the Oxford Dictionaries API? Wonder is there a way to harness it?
What other strategies can we come up with? <distinct>
tagging is so potentially massive that I suspect we’d want to figure out ways of getting a sizable amount of it automated in order to produce any kind of credible results or get anywhere near completeness.
Hi Jonathan and Ronan,
This is indeed a fun exercise! Another person worth consulting about how to categorize Joyce's neologisms would be Elizabeth M. Bonapfel. She presented a brilliant paper on the topic on the Joyce panel I organized at this year's MLA. But I think the above ideas are terrific, and bode well for the TEI editions. I am new to coding, and eager to get to work. Going to get set up in the coming days.
Great stuff, Natasha. Once we are all happy with the encoding conventions for <distinct>
words in the corpus, we’ll make the rules easily accessible in the project CONTRIBUTING.md file.
I know Elizabeth well and thought of her in the context of Joyce’s word-compounding. I’ll give her a buzz and direct her here!
Also, @NRChenier, I meant to ask: have you found any useful compendia / articles / glossaries of Joyce’s non-standard compound words or neologisms etc? I’m wondering if some of the task of tracking down these instances of <distinct>
hasn’t been done for us already (in Joyce crit. over the last sixty years or so). Or have you any ideas how we might isolate the terms? Cheers!
Sounds great @yellwork . And as per useful glossaries: none that I am aware of, besides that put forth by Elizabeth in the paper she presented. It is v helpful! Let's get her in here. You will contact her? I would also be happy to.
And good question re: isolating Joyce's
Andreas Fischer's essay "'Milly Bloom, fairhaired, greenvested, slimsandalled': Joyce's compound adjectives and the OED" in A Collideorscape of Joyce is well worth a read.
Let us know what you think @JonathanReeve .
Hey everyone! I'm going to add some of these tags in Telemachus.