PDFMathTranslate Some letters of an English word at the end of the line are separated and moved to the next line.

2024_compress.pdf Some words at the end of lines, such as "room," "number," "corresponding," "black," "the," and "please," have some letters that are separated and moved to the next line.

Jan 14 '25 01:01 THD0813

之后会引入kp算法解决西文排版问题

Jan 14 '25 02:01 Byaidu

This is due to a flaw in the current line-breaking algorithm, and the new backend also temporarily has this issue. We have noticed this problem, but it will take some time to resolve. Please be patient. Additionally, for Chinese-to-English translation scenarios, we are also experimenting with dynamic line spacing and dynamic font sizes to optimize the results.

Jan 14 '25 02:01 awwaawwa

还有一种方案，缩小译文的字号，让他不会溢出，就像这样如何？

Jan 15 '25 10:01 xxnuo

New backend has been implemented

Jan 15 '25 10:01 awwaawwa

New backend has been implemented

Will it be merged into this repository?

Jan 15 '25 10:01 xxnuo

It is preliminarily estimated that an experimental integration of pdf2zh may be attempted tomorrow, but the new backend currently has numerous bugs and is temporarily unusable.

Jan 15 '25 10:01 awwaawwa

The new backend is an independent project, currently designed to serve as a document translation backend. It has almost completely rewritten the parsing, translation, and typesetting components of pdf2zh, involving a significant amount of work. Therefore, the related implementation code will not be merged into this repository. However, it was considered from the outset to be used as a backend for pdf2zh, such as ensuring the translator is compatible with pdf2zh's translator API, making it relatively easy to integrate into pdf2zh.

Jan 15 '25 10:01 awwaawwa

The new backend is not intended to be directly used by end-users, so it will not support many translators (currently supports Google, Bing, OpenAI). The supported translators are mainly for debugging convenience. The new backend will also not directly provide a web UI. We hope that end-users will directly use pdf2zh, and additional translator support will be implemented within pdf2zh itself.

Jan 15 '25 10:01 awwaawwa

I see that a new backend has been implemented to address the issue of letter separation in English (#462). Is this fix already available in any version of pdf2zh? If not, is there a way to test or manually integrate it into the current version?

Feb 27 '25 05:02 andygeek

I was wrong in my previous comment. The new backend has only implemented dynamic scaling for now. The KP algorithm has not been implemented yet. This fix will be available in pdf2zh 2.0.

BabelDOC can be used directly, with a simple CLI. https://github.com/funstory-ai/BabelDOC

Feb 27 '25 05:02 awwaawwa

This happens also in Spanish. It happens quite often, words are cut.

pdf2zh 2.0 does a bit better, but unfortunately it doesn't justify text (lines ends aligned to both left and right margins).

Sep 28 '25 01:09 igarca

@igarca, please post a new issue in 2.0 repo. : ) This repo's issue is not active.

Sep 28 '25 09:09 hellofinch

PDFMathTranslate PDFMathTranslate copied to clipboard

Some letters of an English word at the end of the line are separated and moved to the next line.

PDFMathTranslate
PDFMathTranslate copied to clipboard