ctex-kit icon indicating copy to clipboard operation
ctex-kit copied to clipboard

Provide a macro (or a hook) to show CJK characters in URLs

Open lemzwerg opened this issue 7 years ago • 4 comments

[The title of this issue is not optimal – maybe you find something better.]

This StackExchange issue demonstrates that ugly hacks are necessary to avoid percent encoding for CJK characters in URLs as produced by biber. The issue is that biber returns a verbatim string for its urlraw field, making it necessary to define at least one new XeTeX character class (for character '/'; this is the most important case). Right now, setting up rudimentary interaction only between this new character class and xeCJK's main character class 1 needs five lines of xeCJK internal code...

I would like to have a macro that sets up a new character class, at the same time also defining default interaction between all xeCJK character classes and the new class.

Both my Chinese and my LaTeX3 knowledge is very rudimentary – maybe this is already possible?

lemzwerg avatar Jan 24 '18 15:01 lemzwerg

There is a way to show CJK characters in URLs with the option CJKmath and the help of package url.

In your example, we can say:

\documentclass{article}

\usepackage[english]{babel}
\usepackage{biblatex}
\usepackage[CJKmath]{xeCJK} %% activate CJKmath
\usepackage{libertine}
\usepackage{hyperref}

\addbibresource{2.bib}
\setCJKmainfont{ipam.ttf}

\DeclareFieldFormat{url}{%
  \mkbibacro{URL}\addcolon\space\href{#1}{\nolinkurl{\thefield{urlraw}}}}

\begin{document}
Hello \cite{Gakushyu}.
\printbibliography
\end{document}

1

It is not a perfect solution. Because line break is not allow between CJK characters in URLs.

There is not a convenient way to define a new character class which you expect in xeCJK.

qinglee avatar Jan 24 '18 16:01 qinglee

Thanks! The setup is indeed much simpler, but honestly spoken I prefer my solution :-) Main reason is that with CJKmath the latin glyphs of the CJK font get used, which don't harmonize with the surrounding non-CJK font – of course, xeCJK is primarily an environment for documents that have a CJK script as the main script.

lemzwerg avatar Jan 30 '18 10:01 lemzwerg

\ttfamily is applied to non-CJK characters in URLs instead of CJK font. It is the default setting of the url package. We can customize the font via \urlstyle{rm}, \urlstyle{sf}, etc.

qinglee avatar Jan 30 '18 12:01 qinglee

Ah, ok. Well, your solution is then indeed preferable for the casual user. I think that having no (automatic) line breaks between CJK characters is not a serious drawback since western URLs also don't break in the middle of a word – in case it is people can still use my approach.

I have updated the StackExchange issue accordingly.

lemzwerg avatar Jan 31 '18 11:01 lemzwerg