biblatex icon indicating copy to clipboard operation
biblatex copied to clipboard

Multiscript support in biblatex/biber

Open plk opened this issue 8 years ago • 175 comments

plk avatar Apr 26 '16 08:04 plk

I can't get the MWE from this SE thread to work.

\documentclass{article}
\usepackage{fontspec} 
\usepackage{polyglossia} 
\setdefaultlanguage{english}
\usepackage{xeCJK}
\setCJKmainfont{Hiragino Mincho Pro}

\usepackage[style=authoryear,%
            language=auto,%
            autolang=langname,%
          vform=romanised]{biblatex}
\addbibresource{literature.bib}
\usepackage{filecontents}
\begin{filecontents}{literature.bib}
@COLLECTION{yanagida_zengaku_sosho_1975,
  LANGID = {japanese},
  EDITOR = {柳田聖山},
  EDITOR_romanised = {Yanagida, Seizan},
  TITLE = {禪學叢書},
  TITLE_romanised = {Chūbun shuppansha},
  TITLE_translated_english = {Collected Materials for the Study of Zen},
  LOCATION = {京都},
  LOCATION_romanised = {Kyōto},
  LOCATION_translated_english = {Kyoto},
  PUBLISHER = {中文出版社},
  PUBLISHER_romanised = {Chūbun shuppansha},
  DATE = {1974/1977}
}
\end{filecontents}
\begin{document}
Hello World.\footcite{yanagida_zengaku_sosho_1975}
\nocite{*}

\printbibliography
\end{document}
}

The vform=romanised isn't recognised

Package xkeyval Error: `vform' undefined in families `blx@opt@pre'.

Is this still be the preferred option to include multi-lingual data in a biblatex file? It seems that romanised is a specific use case, whereas transliteration would be more fitting. Sanskrit texts for example where transcribed into Chinese, without being "romanised." Are there limits on the number of transcriptions and translation one can include?

Are there any restrictions for the contents of the LANGID field? Most catalogues that include language information seem to have these in ISO form. Does each item require one primary language, or would a bilingual edition have two LANGIDs, how do these interact with babel/polyglossia?

duncdrum avatar May 03 '16 14:05 duncdrum

The multiscript code was never in a released version and was in a separate git branch but it is currently in limbo and hasn't been updated in a long time because it really over-complicated the biblatex internals and I hit several problems. I would like to look at it again at some point but at the moment it's not really useable.

plk avatar May 03 '16 15:05 plk

I see so for biblatex export of our records, we should wrap transliteration and translation into standard biblatex fields?

...
EDITOR = {Yanagida~Seizan, 柳田聖山},
...

since biblatexml seems to be in a similar state of limbo?

duncdrum avatar May 03 '16 15:05 duncdrum

It's not so much anything to do with the data source format (biblatexml has had a lot of work for version 3.4 which will be released soon) but the internals which handle multi-script. There is no current obvious way to deal with this apart from perhaps with the related entry functionality - you could have multiple entries with RELATED fields and then you'd have to write driver macros to support this. You could ask on TSE. That format you suggest won't work because name fields have to be parsed by the usual bibtex name parsing rules.

plk avatar May 03 '16 15:05 plk

dear @plk, I will be glad to make any feedback for test in such features. If I remember well conversation on previous topic, two points are very important:

  • the fact that biblatex and biber setting must be correlated
  • the possibility to specify normalisated form in different language for different field of the same entry.

maieul avatar May 06 '16 12:05 maieul

I'm too still interested in this topic. Imho problems are

  • lists (of location or names) as it can happen that only some names needs a translation/transscription.
  • the handling of names where "first name lastname" doesn't work in the same way as in english.
  • How to avoid too complicated "if this variant exists then ... else if this variant exist ... else ..." chains in the drivers.

u-fischer avatar May 19 '16 09:05 u-fischer

It's exactly point three you mention that made me think that the last PoC I did of this was not the right way - it got extremely complicated and ugly in the internals. Point two about names should I think be possible already with the name changes in 3.3 - that was partly the motivation - you can define custom name parts and also customise how sort keys are constructed for name parts (see 93-nameparts.tex sample file which implements basic Russian patronymics).

A problem I forsee is how to determine labelname, labeltitle etc. without demanding that every field they reference has every script variant defined. The alternative is a messy interface to select the script variant of them and I'd rather avoid that.

plk avatar May 19 '16 09:05 plk

Can the name parts interface handle different name types in one author list in one go? That means a chinese name, a russian name with patronoymics and a german? And the main question: How can one manage a bib in xml-format. Things like your answer here http://tex.stackexchange.com/a/308761/2388 looks very good but I doubt that user want to write xml manually.

Perhaps biber could handle an input syntax like this:

 author={\namepart{family}{Fischer} \namepart{given}{Ulrike}  and \namepart{family}{...} \namepart{prefix}{von}  }

Then one wouldn't have to convert everything to xml to explore the power of the \namepart system.

Regarding the labelname: I think one shouldn't overdo the automation but allow user to define them manually if the wishes gets to special, e.g. labelname_typeX={...} with some interface to select such labelnames.

u-fischer avatar May 19 '16 09:05 u-fischer

I would need to experiment a bit but essentially, any name parts defined by \DeclareDatamodelConstant[type=list]{nameparts}{ ... } are available to names. This doesn't really address the multi-script requirement though. The XML format has had a lot of work for 3.4. You don't need to write it by hand - tool mode can convert to/from this (I use the XML format for my own work).

I am a bit loathe to extend the bibtex format as it usually means hacking the btparse C library and this is painful and fragile. It's also a general CPAN module and so it must always remain backwards compatible with any generic bibtex usage.

plk avatar May 19 '16 09:05 plk

The problem with converting bib<->xml is that it will (probably) only work as long the content is usable in both formats. But I do understand that extending the format of normal name fields is difficult. What about a new field format with a strict input syntax with name parts? Then one could use xauthor={....}.

Btw: I get errors when converting to xml and back to bib:

    G:\biblatextest>biber --tool --output-format=biblatexml biblatex-examples.bib
    G:\biblatextest>biber --tool --output-format=bibtex biblatex-examples_bibertool.bltxml

u-fischer avatar May 19 '16 12:05 u-fischer

You need to use --input-format=biblatexml on the second run. I suppose I could auto-detect this from the filename extension.

plk avatar May 19 '16 15:05 plk

@u-fischer - I have added an extended name format for bibtex data sources when using biber. It allows you to specify the name parts explicitly and you can mix and match this with normal bibtex names:

AUTHOR = {Alan Smith and family=Brown, prefix=de, given=Robert}

I'd rather not encourage tex markup in names, hence this format (which has to be handled in biber anyway).

Detection of which parsing routine to use is automatic but you can turn off extended name format parsing with a biber flag in case of issues. It also allows explicit specification of prefices and supports any custom nameparts defined in the data model. See the biber PDF doc and the 93-nameparts.tex which comes with biblatex which uses both biblatexml and this extended bibtex format. This is in 2.6/3.5 dev versions.

plk avatar May 21 '16 17:05 plk

This sounds very good, I will try it tomorrow -- and it is naturally ok that it not a TeX-syntax, I only used it because I'm used to.

u-fischer avatar May 21 '16 21:05 u-fischer

Actually, you are right about the bib<->biblatexml round-trip. There is currently no support for biblatexml->bib, only bib->bib.

EDIT: See below, this is now possible - tool mode can now convert between anything, including the extended name format and normal name format.

plk avatar May 23 '16 22:05 plk

I am new to this discussion and cannot really help developing code etc.

But I can speak Japanese and have rudimentary knowledge of Chinese and Korean, and I do write in the humanities in several languages, using multilingual bibliographies (including Western languages). I will be glad to help with comments if you wish so and can test documents. If this is more of a nuisance, do not hesitate to tell me. This is OK for me.

As for author names, the solution of plk AUTHOR = {Alan Smith and family={Brown}, prefix={de}, given={Robert}} looks good. There is another case with generation names in Korean and Chinese; some want them mentioned isolated, others include them into their personal (given) name.

This is all for now.

Shinoto-github avatar May 26 '16 02:05 Shinoto-github

This could be helpful when I get time to look more into it. The new name format you mention is already in bibaltex 3.5/biber 2.6 development versions (on sourceforge) and you can define any new nameparts you need to deal with things like generation names. However, the main issue with multilingual support is having multiple copies of the same field in the same bibliography data entry and this is something which is quite hard to implement.

plk avatar May 26 '16 07:05 plk

Great to hear about the name format in the development versions!

I see three main problems with Far Eastern sources which I will explain below. If I understand something wrong, please do not take your precious time to correct my view. I would be embarrassed if my comments steal your time rather than help finding a solution.

  1. Different formats in the same bibliography. Some Far Eastern journals request separation of Western and Far East languages; so it is easy to get two different bibliographies and it should be easy to apply a style that is appropriate to the language. But since the style is chosen in the preamble, it must be a universal style for different cases.
  2. Brackets for certain fields (like series titles) are different to those in Western languages. If a package like csquotes can deal with Japanese etc. brackets, and we could choose them in a modification of the style, that would make the bibliography look a lot better.
  3. As for multiple name versions, there are mostly these types: (1) Translations into different languages (for institutions), (2) Transscripts according to different transscript systems into different writing systems, (3) Original writing, and (4) Transcription for identification. The fourth is particularly important in styles similar to authoryear, where the author is identified and mentioned once, whereas each entry might have a different original or transscript or translation. -- But basically, every field can have these variations, a title can be translated or transscribed as well.

For the time being, I am writing my biblatex files with the ID transscript (4) in romanised form into the normal author and editor field, for the other cases I use a field-naming-system that adds the type and the intended language or writing system to the field name. E.g. "author" comes as "author-trsscpt-hepburn" or "author-trslation-de" or "author-orig-ja" etc. -- Whenever I use the entries, for now I use regular expressions to create the fields that the authoryear style recognises in order to get the respective data printed.

Shinoto-github avatar May 26 '16 08:05 Shinoto-github

The new way to give name parts explicitly is really useful. I still think that the biblatexml is a bit too overwhelming for the user, while .bib files are very easy to understand.

Would it also be possible to give the sortnamekeyscheme in this model (per-name)?

moewew avatar Jun 05 '16 16:06 moewew

@moewew - yes, that's important. It should work now for per-namelist and per-name scope in bibtex datasources - see the biblatex doc on \DeclareSortingNamekeyScheme.

I agree about biblatexml - until there is some GUI interface to a backend XML format like this, it's not very easy to see things at a glance even though it's conceptually easier and less prone to errors.

plk avatar Jun 05 '16 17:06 plk

That works very well, thank you.

I noticed that Biber complains (use of uninitialized value in (.)) if a name does not include a family part. The output is fine, but it seems Biber expects names to have a family part (maybe this is connected to the next observation). With the new name scheme it would probably be necessary to allow for customisation of the uniquename option (per sortnamekeyscheme or something new).

This all came up in Bibtex/Biber: how to cite an author using Ethiopian conventions? on TeX.SX.

MWE (for the use of uninitialized value in (.))

\documentclass{article}
\usepackage{filecontents}

\begin{filecontents*}{\jobname.bib}
@book{james,
  author  = {given=James},
  title   = {Test},
  date    = {1983}
}
\end{filecontents*}

\usepackage[style=authoryear]{biblatex}

\addbibresource{\jobname.bib}

\begin{document}
\textcite{james}
\printbibliography
\end{document}

For me the allure of the .bib format is that you don't need a GUI to work with it comfortably. (Judging by the number of questions on TeX.SX about JabRef and other exporters they can even cause more problems than they solve.) Even though I have a soft spot for XML and agree that it is a nicer format to store the data, the biblatexml format is a bit too verbose for me to work with manually.

moewew avatar Jun 06 '16 07:06 moewew

Yes, I need to remove the last traces of assumption that every name has a family name - that's been hard-coded into biblatex/biber for a long time. Then I think uniquename etc. need to be customisable. Looking into it.

plk avatar Jun 06 '16 09:06 plk

Probably a subject for a separate discussion: While I really like the new flexible data model for names, I think biblatex finally reaches a point where we have to think more about how GUIs can handle the changes.

I just added pen names to biblatex-fiwiwhich works fine. But if I have an entry like

Author = {given={William}, family={Atheling}, suffix={Jr.}, truefamily={Blish}, truegiven={James}},

in a .bib file, BibDesk, my GUI of choice, can't handle it anymore. And, of course, the biblatexml can't be handled by any application.

I realize that the biblatex devs can't also take care of the various GUIs, and unfortunately, the BibDesk haven't been very forthcoming about biblatex in the past, but I think some kind of exchange or communication between with some GUI devs could be established, this would be a big boon.

OTOH, if I end up with a .bib file which I can only edit with a text editor, that would, at least for me, be a big step backwards.

simifilm avatar Jun 06 '16 09:06 simifilm

indeed, the Bibdesk team is not very open to biblatex. For example, they don't want mechanism of nested crossref.

However, may I suggest to use : as separator inside field

Author = {given:{William}, family:{Atheling}, suffix:{Jr.}, truefamily:{Blish}, truegiven:{James}}

That will make the GUI be compatible without any modification

maieul avatar Jun 06 '16 09:06 maieul

Good idea - I only use Emacs and so I am not really aware of the GUI situation.

plk avatar Jun 06 '16 10:06 plk

I have tested with bibdesk: ":" is working. What should be tested is, I think, Zotero (with https://github.com/retorquere/zotero-better-bibtex) and JabRef.

maieul avatar Jun 06 '16 10:06 maieul

However, may I suggest to use : as separator inside field

I can't test it, but why should bibdesk care about the separator (colon or equal sign)? Imho the only thing that should matter in a "normal" bibtex application is the numbers of commas.

u-fischer avatar Jun 06 '16 10:06 u-fischer

I imagine it confuses it with the = after the field name. I will make the separator configurable.

plk avatar Jun 06 '16 10:06 plk

I imagine it confuses it with the = after the field name.

It is ;-)

maieul avatar Jun 06 '16 11:06 maieul

I imagine it confuses it with the = after the field name.

It is ;-)

This would be a bug in bibdesk. Fields and names should be allowed to contain an equal sign. Does an equal sign in other fields break too?

u-fischer avatar Jun 06 '16 11:06 u-fischer

I told something wrong. The example from @simifilm is working for me (except that bibdesk is not able to get the correct information�)

So for now, maybe wait test in more common GUI.

maieul avatar Jun 06 '16 11:06 maieul