open-webui
open-webui copied to clipboard
Fix: LaTeX rendering when using CJK
Pull Request Checklist
Note to first-time contributors: Please open a discussion post in Discussions and describe your changes before submitting a pull request.
Before submitting, make sure you've checked the following:
- [x] Target branch: Please verify that the pull request targets the
devbranch. - [x] Description: Provide a concise description of the changes made in this pull request.
- [x] Changelog: Ensure a changelog entry following the format of Keep a Changelog is added at the bottom of the PR description.
- [x] Documentation: Have you updated relevant documentation Open WebUI Docs, or other documentation sources?
- [x] Dependencies: Are there any new dependencies? Have you updated the dependency versions in the documentation?
- [x] Testing: Have you written and run sufficient tests for validating the changes?
- [x] Code review: Have you performed a self-review of your code, addressing any coding standard issues and ensuring adherence to the project's coding standards?
- [x] Prefix: To cleary categorize this pull request, prefix the pull request title, using one of the following:
- BREAKING CHANGE: Significant changes that may affect compatibility
- build: Changes that affect the build system or external dependencies
- ci: Changes to our continuous integration processes or workflows
- chore: Refactor, cleanup, or other non-functional code changes
- docs: Documentation update or addition
- feat: Introduces a new feature or enhancement to the codebase
- fix: Bug fix or error correction
- i18n: Internationalization or localization changes
- perf: Performance improvement
- refactor: Code restructuring for better maintainability, readability, or scalability
- style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc.)
- test: Adding missing tests or correcting existing tests
- WIP: Work in progress, a temporary label for incomplete or ongoing work
Changelog Entry
Description
- Fix LaTeX rendering behind CJK punctuations
Fixed
- LaTeX rendering: Now LaTeX formulas next to CJK can render correctly.
Extra comments
Related discussions: https://github.com/open-webui/open-webui/discussions/9273#discussion-7909298 https://github.com/open-webui/open-webui/discussions/10281#discussion-7985805
The problem was found under Chinese. LLMs do not output space after CJK punctuations (especially :,、), causing LaTeX not rendering.
Example: "加法结合律:$\vec{u} + (\vec{v} + \vec{w}) = (\vec{u} + \vec{v}) + \vec{w}$” Does not render under v0.5.18, Do render after this fix.
Tested under Claude 3.7 sonnet, no extra prompts. Problem resolved for Chinese and Japanese. Sample test prompt: "Introduce me Linear Algrbra using LaTeX in Chinese."
Also it fixed problem when LaTeX follow right after Chinese (without spaces). Example: "加法结合律是$\vec{u} + (\vec{v} + \vec{w}) = (\vec{u} + \vec{v}) + \vec{w}$“
But in case of English, spaces is still required to render LaTeX. Test resuts seems good, but I'm not sure if it affect performance. Double check needed
Could you confirm the changes also work with at least: https://github.com/open-webui/open-webui/discussions/3581
Could you confirm the changes also work with at least: #3581
Sry, I think I miss something here. The current fix applies only to expressions directly beside CJK characters, but not covering issue #3581, which may be caused by other factors. And testings are mostly using flagship models, who (in testing) have better format on LaTeX.
And……the current fix might broke variations like [...] , but works well with $...$.
I will try other approach, but in our testing, prompt is recommanded to make sure model output with spaces and $...$. This might be a alternative solution.
For reference, this is me and @richards199999 's prompt for claude 3.7 sonnet:
The assistant is Claude, created by Anthropic.
The current time and date is {{CURRENT_DATE}}, {{CURRENT_TIME}}. The human is at Asia/Shanghai time zone, Claude would always reference this.
Claude is happy to engage in conversation with the human when appropriate. Claude engages in authentic conversation by responding to the information provided, asking specific and relevant questions, showing genuine curiosity, and exploring the situation in a balanced way without relying on generic statements. This approach involves actively processing information, formulating thoughtful responses, maintaining objectivity, knowing when to focus on emotions or practicalities, and showing genuine care for the human while engaging in a natural, flowing dialogue that is at the same time focused and succinct.
Claude can use webpages (aka Artifacts) and Mermaid to provide visualized content for humans. Any webpages and Mermaid embedded in corresponding code blocks will be rendered. Claude knows when to provide web slices and charts to make things clear.
Claude uses proper Markdown and LaTeX format in its response.
Claude is aware of how overall conversation flows between topics, tones and styles. It NEVER starts the response with headline, which would be quite annoying.
Claude is now connected with the human.
<tool_info>
Human is able to enable the web searching and executing python code (aka Code Interpreter) via the interface. When they have enable it, Claude will automictically receive the tool result (i.e. search results or code execution result); when it doesn't not see anything, then it means human has not enable the feature, Claude need to ask them to do so when necessary.
</tool_info>
<latex_info>
The assistant can render a wide range of LaTeX equations and expressions, including most math notation and many advanced commands.
Inline equations are denoted with $...$
Block equations are denoted with:
$$
...
$$
当使用中文时,输出 $latex$ 公式前后始终使用空格间隔。
</latex_info>
This prompt works well with the sonnet, and fix almost all LaTeX rendering problem in Chinese and English with this fix. But it does not works for Google models (which use [...] and poor formats).
Let me try continue fixing, but it's almost impossibe to make it works for every model. Will convert to draft until I can bring something better.