BabelDOC icon indicating copy to clipboard operation
BabelDOC copied to clipboard

Multilanguage support

Open gmdig opened this issue 10 months ago • 7 comments

Is your feature request related to a problem?

Please add other language support. Thanks!

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

gmdig avatar Feb 28 '25 16:02 gmdig

Support for other languages requires rewriting the typesetting algorithm, which we are working hard to implement.

awwaawwa avatar Feb 28 '25 16:02 awwaawwa

Traditional Chinese may be added soon, as it doesn't require algorithm adjustments.

awwaawwa avatar Feb 28 '25 16:02 awwaawwa

Good news! After some hard work on my part, initial support for English has now been added. I will try my best not to break lines within words (described by the regular expression [0-9A-Za-z]+).

Although I'm still using Chinese fonts, I'll find a separate set of fonts when I have time later.

In addition, I haven't studied the regular expressions for words in other languages yet. Welcome the community to provide them!

awwaawwa avatar Feb 28 '25 16:02 awwaawwa

Thank you for your reply. My fault for the misunderstanding. I meant support for translating files from/to other languages, such as spanish, japanese, by assigning parameter "--lan-in" or "--lan-out", which does not work for these languages by now.

Thanks again for your excellent work!

gmdig avatar Mar 01 '25 04:03 gmdig

Please use LLM as the translation engine.

awwaawwa avatar Mar 01 '25 06:03 awwaawwa

Or you can wait for the integration of pdf2zh's babeldoc. The translator on pdf2zh should have better support for multiple languages.

awwaawwa avatar Mar 01 '25 06:03 awwaawwa

For proper word splitting, it is worth using the solution that Scribus has in many languages. I have been using Scribus for many years to publish texts

See file here (scribus/scribus/hyphenator.cpp): https://github.com/scribusproject/scribus/blob/8c471acebc96b62e7f446adf24323d457c094350/scribus/hyphenator.cpp

The source for Scribus is available from our Subversion repository at svn://scribus.net. “trunk” is the primary development branch. Note: there are other replicas of this repository (eg on github) however this is the official, supported code repository.

and also you can add support for "short words" which are not left at the end of the line, but moved to the next one. See: scribus/plugins/short-words https://github.com/scribusproject/scribus/tree/8c471acebc96b62e7f446adf24323d457c094350/scribus/plugins/short-words

marobe765 avatar Mar 31 '25 08:03 marobe765