hyperref icon indicating copy to clipboard operation
hyperref copied to clipboard

UTF8 in TextField's 'default'

Open madchemiker opened this issue 8 years ago • 5 comments

Like https://github.com/ho-tex/hyperref/issues/5, LuaLaTeX produces an incorrect result if you use non-ASCII characters for 'default' in TextField. It seems pdfLaTeX produces the correct result if you use only LATIN1-characters, but this problem occurs on pdfLaTeX too, if you use Japanese characters for 'default'. The following hack fixes this problem. (dank des Kommentars von Frau Fischer in https://github.com/ho-tex/hyperref/issues/5)

\documentclass{article}

\usepackage{fontspec}		% ---- for LuaLaTeX
% \usepackage[utf8]{inputenc}	% ---- for pdfLaTeX

\usepackage[unicode=true]{hyperref}
\usepackage[T1]{fontenc}

% FIX for 'default'
% See also the fix for 'value' by Frau Fischer
% ( https://github.com/ho-tex/hyperref/issues/5 ).
% I just replaced 'value' with 'default'.
\makeatletter
\define@key{Field}{default}{%
  \Hy@pdfstringdef\Fld@default{#1}}
\makeatother

% NOTE:
%   This fix IS needed for pdfLaTeX too, when you use Japanese
%   characters.
%
% \documentclass{article}
% \usepackage[whole]{bxcjkjatype}
% \usepackage[utf8]{inputenc}
% \begin{document}
% \begin{Form}
%   % without fix, fails to compile
%   \TextField[name=addr,default=東京]{Address} %Tokyo
% \end{Form}
% \end{document}

\begin{document}
\begin{Form}
  % OK, of course
  \TextField[name=textfield1]{Address} \\

  % OK
  \TextField[name=textfield2,value=Köln]{Address} \\

  % correct on pdfLaTeX (but incorrect if you use Japanese characters)
  % incorrect without FIX on LuaLaTeX
  \TextField[name=textfield3,default=München]{Address} \\

  % (though this is probably meaningless)
  % incorrect (internally) on LuaLaTeX
  % 
  % $ pdftk pr-textfield-default-encoding.pdf dump_data_fields
  % ...
  % FieldValue: Köln			% This is OK, but
  % FieldValueDefault: München	% Quatsch!
  % ...
  \TextField[name=textfield4,value=Köln,default=München]{Address}
\end{Form}
\end{document}

madchemiker avatar Nov 04 '17 09:11 madchemiker

Sorry, I have noticed that this fix must be applied ONLY to TextField. This fix causes an improper behavior for ChoiceMenu (both on pdfLaTeX and LuaLaTeX).

\begin{Form}
  \ChoiceMenu[radio,name=choice,default=Yes]{TeX User}{Yes,No}
\end{Form}

with "FIX" produces:

$ pdftk pr2.pdf dump_data_fields
---
FieldType: Button
FieldName: choice
FieldFlags: 49152
FieldValue: \376\377\000Y\000e\000s     # should be "Yes"
FieldJustification: Left
FieldStateOption: Yes

madchemiker avatar Nov 04 '17 09:11 madchemiker

The fix for the default field is certainly needed. But I don't see a problem with the choice menu. With the option unicode you are forcing everything into UTF16BE, and so Yes is encoded as \376\377\000Y\000e\000s. If you don't like this try \usepackage[pdfencoding=auto]{hyperref} instead.

u-fischer avatar Nov 05 '17 23:11 u-fischer

Thank you for your reply. I thought the FIX should not be applyed for ChoiceMenu, because the following code with FIX does not work as expected.

\documentclass{article}

\usepackage{fontspec}		% ---- for LuaLaTeX
% \usepackage[utf8]{inputenc}	% ---- for PDFLaTeX

\usepackage[unicode=true]{hyperref}
\usepackage[T1]{fontenc}

\begin{document}
\begin{Form}
  % 'Yes' is checked (as expected)
  \ChoiceMenu[radio,name=nofix,default=Yes]{TeX User?}{Yes,No}

  \makeatletter
  \define@key{Field}{default}{%
    \Hy@pdfstringdef\Fld@default{#1}}
  \makeatother

  % 'Yes' is NOT checked
  \ChoiceMenu[radio,name=withfix,default=Yes]{TeX User?}{Yes,No}
\end{Form}
\end{document}

But this is caused probably by the inconsistency of Charset (encoding) for FieldValue and FieldStateOption.

So I think I should say now: not only FieldValue but also FieldStateOption should be encoded as UTF16 for ChoiceMenu.

FYI: The results of pdftk.

  1. the PDF file which is generated by LuaLaTeX

    $ pdftk choice.pdf dump_data_fields

    FieldType: Button FieldName: nofix FieldFlags: 49152 FieldValue: Yes FieldJustification: Left FieldStateOption: Yes

    FieldType: Button FieldName: withfix FieldFlags: 49152 FieldValue: \376\377\000Y\000e\000s FieldJustification: Left FieldStateOption: Yes

  2. After the PDF file is edited with Acrobat Reader DC (checked both "NO"-fields)

    $ pdftk choice.pdf dump_data_fields

    FieldType: Button FieldName: nofix FieldFlags: 49152 FieldValue: No FieldValue: Yes FieldJustification: Left FieldStateOption: No FieldStateOption: Off FieldStateOption: Yes

    FieldType: Button FieldName: withfix FieldFlags: 49152 FieldValue: No FieldValue: \376\377\000Y\000e\000s FieldJustification: Left FieldStateOption: No FieldStateOption: Off FieldStateOption: Yes

I don't know why the FieldValue is duplicated...

madchemiker avatar Nov 06 '17 21:11 madchemiker

I see what you mean. I will look at it but not today.

u-fischer avatar Nov 06 '17 23:11 u-fischer

I think it will in the next version work for umlauts and other chars in T1-encoding, but not japanese - this would imho need extended changes in the font resources.

u-fischer avatar Sep 16 '19 21:09 u-fischer