PDFMathTranslate
PDFMathTranslate copied to clipboard
Some letters of an English word at the end of the line are separated and moved to the next line.
2024_compress.pdf
Some words at the end of lines, such as "room," "number," "corresponding," "black," "the," and "please," have some letters that are separated and moved to the next line.
之后会引入kp算法解决西文排版问题
This is due to a flaw in the current line-breaking algorithm, and the new backend also temporarily has this issue. We have noticed this problem, but it will take some time to resolve. Please be patient. Additionally, for Chinese-to-English translation scenarios, we are also experimenting with dynamic line spacing and dynamic font sizes to optimize the results.
还有一种方案,缩小译文的字号,让他不会溢出,就像这样
如何?
New backend has been implemented
New backend has been implemented
Will it be merged into this repository?
It is preliminarily estimated that an experimental integration of pdf2zh may be attempted tomorrow, but the new backend currently has numerous bugs and is temporarily unusable.
The new backend is an independent project, currently designed to serve as a document translation backend. It has almost completely rewritten the parsing, translation, and typesetting components of pdf2zh, involving a significant amount of work. Therefore, the related implementation code will not be merged into this repository. However, it was considered from the outset to be used as a backend for pdf2zh, such as ensuring the translator is compatible with pdf2zh's translator API, making it relatively easy to integrate into pdf2zh.
The new backend is not intended to be directly used by end-users, so it will not support many translators (currently supports Google, Bing, OpenAI). The supported translators are mainly for debugging convenience. The new backend will also not directly provide a web UI. We hope that end-users will directly use pdf2zh, and additional translator support will be implemented within pdf2zh itself.
I see that a new backend has been implemented to address the issue of letter separation in English (#462). Is this fix already available in any version of pdf2zh? If not, is there a way to test or manually integrate it into the current version?
I was wrong in my previous comment. The new backend has only implemented dynamic scaling for now. The KP algorithm has not been implemented yet. This fix will be available in pdf2zh 2.0.
BabelDOC can be used directly, with a simple CLI. https://github.com/funstory-ai/BabelDOC
This happens also in Spanish. It happens quite often, words are cut.
pdf2zh 2.0 does a bit better, but unfortunately it doesn't justify text (lines ends aligned to both left and right margins).
@igarca, please post a new issue in 2.0 repo. : ) This repo's issue is not active.