Some org author names are being incorrectly rearranged around commas on import
Problem
Some imported author names from MARC imports are being treated as comma separated personal names when they are org names with parenthetical place clarifications. There may be other examples of similar problems.
The problem is that commas are assumed to indicate 'Last, First' personal names.
The fix would be to add some logic to distinguish between valid 'Last, First' name names and other names containing commas which should not be rearranged.
To do this we probably need some more examples.
Although, since MARC 710 is 710 - Added Entry-Corporate Name, it makes sense that this field should never be rearranged as a personal name... needs some thought.
example: China). Zhongguo shi ge yan jiu zhong xin Shou du shi fan da xue (Beijing
https://openlibrary.org/authors/OL14087565A/China).Zhongguo_shi_ge_yan_jiu_zhong_xin_Shou_du_shi_fan_da_xue(Beijing
Original MARC:
https://openlibrary.org/show-records/harvard_bibliographic_metadata/ab.bib.10.20150123.full.mrc:134034844:1323
The name is taken from 710$
710 2 $6880-04$aShou du shi fan da xue (Beijing, China).$bZhongguo shi ge yan jiu zhong xin.
And '(Beijing, China)' is being treated and rearranged as if it were 'Last, First'
Reproducing the bug
- Go to ...
- Do ...
- Expected behavior:
The resultant name should be more like:
Shou du shi fan da xue (Beijing, China) – Zhongguo shi ge yan jiu zhong xin
or 首都师范大学 (Beijing, China) – 中国诗歌硏究中心
- Actual behavior:
Context
- Browser (Chrome, Safari, Firefox, etc):
- OS (Windows, Mac, etc):
- Logged in (Y/N):
- Environment (prod, dev, local): prod
Breakdown
Requirements Checklist
- [ ]
Related files
Stakeholders
Instructions for Contributors
- Please run these commands to ensure your repository is up to date before creating a new branch to work on this issue and each time after pushing code to Github, because the pre-commit bot may add commits to your PRs upstream.
The original characters in this example are also not being imported, that would be a new feature, but there may be existing code for other fields to fetch 880 original script values.
880 2 $6710-04$a首都师范大学 (Beijing, China).$b中国诗歌硏究中心.
Noting this in passing.
The original characters in this example are also not being imported, that would be a new feature, but there may be existing code for other fields to fetch 880 original script values.
880 2 $6710-04$a首都师范大学 (Beijing, China).$b中国诗歌硏究中心.
That looks like a case that #7652 was intended to handle. It has the correct linkage in the $6 (710-04), so the Chinese script should have been associated with the author record as an alternate name (although I'd actually prefer to have it as the default name).
One strong reason to try and get this right is that I've yet to find any translation software that can deal with transliterate text (although it might be possible with a multi-step process), but if I paste the Chine into Google Translate, I get back the perfectly reasonable translation of "Capital Normal University (Beijing, China).$bChinese Poetry Research Center."
Although, since MARC 710 is 710 - Added Entry-Corporate Name, it makes sense that this field should never be rearranged as a personal name... needs some thought.
Actually, there are specific guidelines for this field at https://www.loc.gov/marc/bibliographic/bdx10.html and they say that a First Indicator value of 2, which this 710 has, means "Name in direct order" so it definitely shouldn't be rearranged.
The full set of values is:
Type of corporate name entry element 0 - Inverted name 1 - Jurisdiction name 2 - Name in direct order
Meeting names have a similar set of indicators: https://www.loc.gov/marc/bibliographic/bdx11.html
I haven't investigated in depth, but something suspicious that catches my eye is that the $6 subfield isn't listed for any of these entries:
https://github.com/internetarchive/openlibrary/blob/bf4bc9d8e9d4f3bda987215a310c959b23ca6d52/openlibrary/catalog/marc/parse.py#L580-L585
but in any case, it should be pretty quick to find with an appropriate test case or two...
This is another example for an 880 original script test case: https://openlibrary.org/show-records/harvard_bibliographic_metadata/ab.bib.13.20150123.full.mrc:554571729:1305
Thanks for your analysis on this issue @tfmorris !
I've been trying to add a failing test for the name ordering, and it keeps getting handled correctly in the current code. I think I fixed this recently in #9601 by refactoring, but before I noticed this specific version of the problem, this import seems to have occurred after the merge, but crucially before the fixes were deployed to production.
I think the name ordering is fixed by #9601 , but I will continue with the 880 original script improvement. I agree, I though we'd fixed this previously (it's working for titles and direct authors), looks like the 7XX fields missed out on the original script treatment. I'll rectify this.
Another recent example of this issue: (for testing the fix) https://openlibrary.org/books/OL30706353M/Ubuntu_good_faith_and_equity and org/conference author: https://openlibrary.org/authors/OL8657904A/South_Africa)Humboldt_Kolleg_Interdisciplinary_Conference(2010_Potchefstroom