SATySFi
SATySFi copied to clipboard
Proposal: I18n: Support non-English hyphenation dictionaries
This proposal is to add support of hyphenation of non-English languages. This is the first step of supporting internationalization.
Proposal
- Add a new type:
-
hyphen-dict
Hyphenation pattern. Underlying OCaml representation isLoadHyph.t
.
-
- Add new primitives:
-
load-hyphen-dict : string -> hyphen-dict
-
set-hyphen-dict : hyphen-dict -> ctx -> ctx
-
get-hyphen-dict : ctx -> hyphen-dict
-
- Use BCP 47 Language Tag or UTS#35 Language Identifier for filenames of hyphenation dictionary files.
- The current hyphenation file
english.satysfi-hyph
needs to be renamed withen.satysfi-hyph
.
- The current hyphenation file
load-hyphen-pattern language
loads a hyphenation dictionary from hyph/<language>.satysfi-hyph
. It raises an exception when the file is not found.
set-hyphen-pattern hyph ctx
sets hyphnation pattern hyph
to ctx.hyphenation_pattern
.
get-hyphen-pattern ctx
returns hyphnation pattern ctx.hyphenation_pattern
.
Current Implementation
- English hyphenation is located at
lib-satysfi/dist/hyph/english.satysfi-hyph
-
english.satysfi-hyph
is loaded at https://github.com/gfngfn/SATySFi/blob/1243829f9dcaf955e4ba0f5222a0f95b34e74e32/src/frontend/primitives.cppo.ml#L604 - The only operation which sets
hyphenation_dictionary
isget_pdf_mode_initial_context
at https://github.com/gfngfn/SATySFi/blob/1243829f9dcaf955e4ba0f5222a0f95b34e74e32/src/frontend/primitives.cppo.ml#L497
Alternative Options
Activate multiple hyphen-dict
s at the same time
This proposal based on a design where users can replace English hyphenation pattern with other language's. It may be natural to set a hyphenation dictionary to each language/script (i.e., set-hyphen-dict : language-tag -> hyphen-dict > ctx -> ctx
or set-hyphen-dict : hyphen-dict language-tag-map -> ctx -> ctx
) rather than applying given hyphenation pattern globally, if we decide to extend the multi-language system, where English and Japanese are automatically detected with script types.
Introducing new type hyphen-dict
Instead of introducing hyphen-dict
and having users explicitly handle hyphenation dictionaries, we could provide primitives get/set strings that represent languages (e.g., set-hyphen-dict : string -> ctx -> ctx
).
However, hyphen-dict
type allows more extension points (e.g., tweaking hyphenation patterns, adding exceptional words ad hoc) in future.
load-hyphen-dict
throwing exceptions
load-hyphen-dict
can have signature load-hyphen-dict : string -> hyphen-dict option
. I don't have strong opinion about this. I was thinking of having a new package for each language, therefore specifying wrong filenames is unlikely.
Having a primitive to get available hyphenation dictionary files
I could include another primitive get-hyph-dict-list
that returns available files under hyph/
(for example, returning [ "en" ]
). This primitive is not mandatory.
Renaming english.satysfi-hyph
for en.satysfi-hyph
We could leave the filename as is. However, considering even TeX has already adopted naming scheme with BCP 47 Language Tag, there is no reason to stick at traditional naming scheme with language names in English.
で、そのTeXとかいうやつのハイフネーションファイルの名前も今ではコレだったりする。#TeX pic.twitter.com/48vtJFz8G7
— 某ZR(ざんねん🙃) (@zr_tex8r) January 11, 2020
May I consider this proposal approved? If so, I’ll work on this after the refactoring is done.