NextChat
                                
                                 NextChat copied to clipboard
                                
                                    NextChat copied to clipboard
                            
                            
                            
                        [Bug] Incorrect Handling of Mixed LaTeX Math Symbols and Natural Language Text with Dollar Signs
Bug Description
To solve issue #2841, current text buffer is modified by escapeDollarNumber.
// app/components/markdown.tsx
function escapeDollarNumber(text: string) {
  let escapedText = "";
  for (let i = 0; i < text.length; i += 1) {
    let char = text[i];
    const nextChar = text[i + 1] || " ";
    if (char === "$" && nextChar >= "0" && nextChar <= "9") {
      char = "\\$";
    }
    escapedText += char;
  }
  return escapedText;
}
However, current algorithm will affect all latex formula start with numbers. Even $1 + 1 = 2$ cannot be correctly displayed.
I also have a real-world example:
例如,在表达式$\lambda . \lambda . 1$中,最内层的$1$是封闭的,因为它的索引值$1$等于它在表达式中的深度$1$。同样,在表达式$\lambda . \lambda . 2$中,最内层的$2$也是封闭的,因为它的索引值$2$等于它在表达式中的深度$2$。
如果一个变量的索引值大于它所在的深度,那么它就被认为是自由的。例如,在表达式$\lambda . 2$中,$2$就是一个自由变量,因为它的索引值$2$大于它在表达式中的深度$1$。
I know this is a difficult problem to solve, but such case is not rare.
Steps to Reproduce
- start a new talk
- say: "Please output 1 + 1 = 2 in latex"
- GPT will output in latex, and this will not be correctly displayed
Expected Behavior
Most latex formula start with numbers should be correctly displayed.
Screenshots
No response
Deployment Method
- [X] Docker
- [ ] Vercel
- [ ] Server
Desktop OS
No response
Desktop Browser
No response
Desktop Browser Version
No response
Smartphone Device
No response
Smartphone OS
No response
Smartphone Browser
No response
Smartphone Browser Version
No response
Additional Logs
No response
I know this related to this
- #4155
- #3964
- #3239
I know this related to this
* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155) * [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964) * [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)
The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.
I know this related to this
* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155) * [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964) * [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.
This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.
I know this related to this
* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155) * [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964) * [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.
This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the
React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.
Regardless of how the code is encapsulated, it seems that there is no way to avoid using complex logic and regular expressions to address this issue. I conducted a brief search for Markdown rendering packages in Node.js, and it appears that almost all packages have given up on properly handling the rendering of the dollar sign. The maintainers seem to have chosen a rather passive approach of not addressing such rendering issues.
The issue might be the only valuable thing there; markdown-it doesn't support LaTeX at all, as for react-markdown, you know.
Frankly, if everyone continues to handle this issue with a negative attitude, it may eventually be left to LLM for maintenance.
I know this related to this
* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155) * [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964) * [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.
This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the
React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.Regardless of how the code is encapsulated, it seems that there is no way to avoid using complex logic and regular expressions to address this issue. I conducted a brief search for Markdown rendering packages in Node.js, and it appears that almost all packages have given up on properly handling the rendering of the dollar sign. The maintainers seem to have chosen a rather passive approach of not addressing such rendering issues.
The issue might be the only valuable thing there; markdown-it doesn't support LaTeX at all, as for react-markdown, you know.
Frankly, if everyone continues to handle this issue with a negative attitude, it may eventually be left to LLM for maintenance.
I believe there's always a way to resolve this without resorting to complex logic and excessive use of regular expressions. It's just that I currently don't have the time to do it.
I know this related to this
* [[Bug] LaTeX 渲染异常 #4155](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4155) * [[Bug] latex 公式渲染 问题 #3964](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3964) * [[Bug] LaTeX Syntax still bug #3239](https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/3239)The final solution has not been confirmed yet? Honestly, without the merged pull request for fixing the dollar sign issue, further improvements are out of the question.
This issue is challenging to resolve. I'm not convinced it's feasible to fix given its complexity, particularly for the frontend and the
React Markdown. It might be more practical to create a simpler, standalone package rather than dealing with the complexities of this issue.Regardless of how the code is encapsulated, it seems that there is no way to avoid using complex logic and regular expressions to address this issue. I conducted a brief search for Markdown rendering packages in Node.js, and it appears that almost all packages have given up on properly handling the rendering of the dollar sign. The maintainers seem to have chosen a rather passive approach of not addressing such rendering issues. The issue might be the only valuable thing there; markdown-it doesn't support LaTeX at all, as for react-markdown, you know. Frankly, if everyone continues to handle this issue with a negative attitude, it may eventually be left to LLM for maintenance.
I believe there's always a way to resolve this without resorting to complex logic and excessive use of regular expressions. It's just that I currently don't have the time to do it.
I took a quick look at the example of markdown-to-jsx, and it seems that it requires writing LaTeX rendering conditions. This task seems a bit simpler compared to what we are currently working on, at least we don't have to replace dollar signs. However, the question is whether it's worth refactoring the code.
Honestly, if there are no existing solutions available, our choices might be limited.
I meet the problem too. Can we use the $ to announce here is a math syntax and use $ to annouce here is a price or something.
The `` of $ xxx can be add with LLM by using prompt.
Just some simple ideas.
I meet the problem too. Can we use the $ to announce here is a math syntax and use
$to annouce here is a price or something.The `` of
$ xxxcan be add with LLM by using prompt.Just some simple ideas.
Add to prompt may not be a good idea, as each time it will cost some tokens.
I found a solution: https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/pull/4354 I tested it with different examples, and it worked.
I found a solution: #4354 I tested it with different examples, and it worked.
Would you like to share some insights in your PR? I cannot understand the complex regex used in your code
/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[a-zA-Z.,;!?]?\s*$|\s+[a-zA-Z]|\s+\$)(?!`)/g
I came up with it with the help of LLMs.
Here is an explanation:
- Ensure that the dollar sign ($) is not preceded by a backtick (`).
- Match the dollar sign ($).
- Match one or more digits (\d+).
- Optionally match a decimal separator (.or,) followed by one or more digits ((?:[.,]\d+)*).
- Ensure that the matched dollar amount is followed by:
- Either the end of the line ($), or
- A non-word character (e.g., punctuation mark like .,,,;,!,?) and then the end of the line, or
- A word character (e.g., a letter) preceded by one or more whitespace characters, or
- Another dollar sign ($) preceded by one or more whitespace characters.
 
- Either the end of the line (
- Ensure that the dollar sign is not followed by a backtick (`).
- The gflag at the end makes the regular expression global, meaning it will match all occurrences in the text.
I think it still can be improved upon.
I noticed an issue with the regex and fixed it. It is now:
/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$)(?!`)/g
    // Regex explanation:
    // (?<!`)                 # Negative lookbehind to ensure the '$' is not preceded by a backtick (`)
    // \$                     # Match a literal '$' character
    // (\d+(?:[.,]\d+)*)      # Capture group 1: Match one or more digits, optionally followed by a decimal part (e.g., 123.45)
    // (?=                    # Positive lookahead to ensure the following conditions are met:
    //   \s*[.,;!?]\s*\B      #   The number is followed by a punctuation mark (.,;!?) and a non-word boundary
    //   |                    #   OR
    //   \s+[a-zA-Z]          #   The number is followed by one or more whitespace characters and a letter
    //   |                    #   OR
    //   \s+\$                #   The number is followed by one or more whitespace characters and a '$' sign
    // )
    // (?!`)                  # Negative lookahead to ensure the '$' is not followed by a backtick (`)
    // /g                     # Global flag to replace all occurrences
I noticed an issue with the regex and fixed it. It is now:
/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$)(?!`)/g// Regex explanation: // (?<!`) # Negative lookbehind to ensure the '$' is not preceded by a backtick (`) // \$ # Match a literal '$' character // (\d+(?:[.,]\d+)*) # Capture group 1: Match one or more digits, optionally followed by a decimal part (e.g., 123.45) // (?= # Positive lookahead to ensure the following conditions are met: // \s*[.,;!?]\s*\B # The number is followed by a punctuation mark (.,;!?) and a non-word boundary // | # OR // \s+[a-zA-Z] # The number is followed by one or more whitespace characters and a letter // | # OR // \s+\$ # The number is followed by one or more whitespace characters and a '$' sign // ) // (?!`) # Negative lookahead to ensure the '$' is not followed by a backtick (`) // /g # Global flag to replace all occurrences
Thanks for your explanation. I tried some examples:
function check(line) {
     console.log(line.replace(/(?<!`)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$)(?!`)/g, '\\$&'));
}
check('The price of xxx is $1')
check('The price of xxx is $1. You can buy it for $0.95 or lower')
check('例如,在表达式$\lambda . \lambda . 1$中,最内层的$1$是封闭的,因为它的索>引值$1$等于它在表达式中的深度$1$。同样,在表达式$\lambda . \lambda . 2$中,最内>层的$2$也是封闭的,因为它的索引值$2$等于它在表达式中的深度$2$')
check('$1 + 1 = 2$')
The output:
The price of xxx is $1
The price of xxx is \$1. You can buy it for \$0.95 or lower
例如,在表达式$lambda . lambda . 1$中,最内层的$1$是封闭的,因为它的索引值$1$等于它在表达式中的深度$1$。同样,在表达式$lambda . lambda . 2$中,最内层的$2$也是封闭的,因为它的索引值$2$等于它在表达式中的深度$2$
$1 + 1 = 2$
It seems your solution works fine in these examples.
It is now:
/(?<!`|\\)\$(\d+(?:[.,]\d+)*)(?=\s*[.,;!?]\s*\B|\s+[a-zA-Z]|\s+\$|$)(?!`)/g
I noticed the issues in your output and fixed them. Update: fixing other scenarios and rare use cases.
/(?<!`|\\)\$(\d+(\w+)?(?:[.,]\d+(\w+)?)*)(?=\s*[.,;?]\s*\B|!?\s+[a-zA-Z]|!?\s+\$|!?\s*[-=+\/]\s*\$\b|$)(?!`)/g
Update: I went about it the wrong. The new PR is the one to use, it has a better and short regex, covering all cases. https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/pull/4363
/(?<!`|\\)\$\d+([,.](\d+[,.])?\d+)?(?!.*\$\B)(?!`)/g
得益于Algorithm5838 的贡献,目前该问题已解决
Bot detected the issue body's language is not English, translate it automatically.
Thanks to the contribution of Algorithm5838, this problem has been solved.
Unfortunately, there are still some issues:
- If you did not use the Inject System Prompt, the issues will persist, as the LLM might still use single dollar signs for inline LaTeX.
- Similarly, the same problem is present in block LaTeX, where if the double dollar signs are followed by a number, the LaTeX rendering would break. The first two issues are related because the dollar sign(s) is followed by a number.
- Another issue is that if the dollar sign and number are inside a code block or inline code, a backslash would be rendered incorrectly.
And here is a related issue: https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/4537
My workaround has fixed these three issues.
Current implementation:
My workaround:
You can try it yourself, here is the instance of my fork https://github.com/Algorithm5838/NextChat/tree/dollar-sign: https://nextchat-git-dollar-sign-algorithm5838s-projects.vercel.app/
Perhaps this problem will never be solved.
@daiaji Did you try my workaround? If so, how did you find it?
Sorry, I just feel very frustrated.
As you can see, I submitted this PR. Honestly, even though GPT has provided a lot of help and it has taken up a significant amount of my time, it seems that the problem is still far from being solved.
That's all for now.😔
Sorry, I just feel very frustrated.
As you can see, I submitted this PR. Honestly, even though GPT has provided a lot of help and it has taken up a significant amount of my time, it seems that the problem is still far from being solved.
That's all for now.😔
I understand your frustration. It can be disheartening when you've put in a significant amount of time and effort into a pull request and the problem still remains unsolved.
In my opinion, this issue should definitely be addressed in the remark parser. The parser should correctly identify what is math and what is a US dollar symbol. Interestingly, I have never encountered such a problem when using pandoc (for converting and blogging, see My Blog Project). This is because pandoc uses a stronger rule for markdown math, as documented in pandoc's user guide:
Extension: tex_math_dollars Anything between two $ characters will be treated as TeX math. The opening $ must have a non-space character immediately to its right, while the closing $ must have a non-space character immediately to its left, and must not be followed immediately by a digit. Thus, $20,000 and $30,000 won’t parse as math. If for some reason you need to enclose text in literal $ characters, backslash-escape them and they won’t be treated as math delimiters.
I have tested my inputs, and all of them are correctly handled by pandoc. Most of the time, the output of ChatGPT follows this guideline. So I'd like to figure out why remark don't use this rule.
Sorry, I just feel very frustrated. As you can see, I submitted this PR. Honestly, even though GPT has provided a lot of help and it has taken up a significant amount of my time, it seems that the problem is still far from being solved. That's all for now.😔
I understand your frustration. It can be disheartening when you've put in a significant amount of time and effort into a pull request and the problem still remains unsolved.
In my opinion, this issue should definitely be addressed in the
remarkparser. The parser should correctly identify what is math and what is a US dollar symbol. Interestingly, I have never encountered such a problem when using pandoc (for converting and blogging, see My Blog Project). This is because pandoc uses a stronger rule for markdown math, as documented in pandoc's user guide:Extension: tex_math_dollars Anything between two $ characters will be treated as TeX math. The opening $ must have a non-space character immediately to its right, while the closing $ must have a non-space character immediately to its left, and must not be followed immediately by a digit. Thus, $20,000 and $30,000 won’t parse as math. If for some reason you need to enclose text in literal $ characters, backslash-escape them and they won’t be treated as math delimiters.
I have tested my inputs, and all of them are correctly handled by pandoc. Most of the time, the output of ChatGPT follows this guideline. So I'd like to figure out why
remarkdon't use this rule.
It's not possible to fix anyway related to LaTeX because the module conflicts with the front-end CSS and UI/UX.