chat-ui
chat-ui copied to clipboard
[v0.9.1] Formatting issues while rendering code
@nsarrazin Whenever I ask chat-ui to explain / generate code, the < does not get rendered correctly. Can you please take a look?
If you still have access, could you send me the raw conversation that shows this behaviour ?
there's a download button next to user messages in the UI
OK. Think I can explain this one, and offer an improvement.
Code blocks in markdown can either be fenced ( ```html) or indented 4 spaces ( ).
The issue arises when the LLM responds with a code block that is both fenced AND indented.
In this case I think the correct behaviour is to show a code block, with the fences displayed as part of the code. VSCode and https://markdownlivepreview.com/ do this.
What is happening in Chat-UI seems to be:
- The marked lexer does not pick this up as a code block, meaning that
<CodeBlock>isn't used. - The marked renderer does which emits
<pre>and<code>tags, causing the styling to look similar to a correctly rendered code block and the < to go through as-is. Note that the Copy to Clipboard button is not present because it hasn't been rendered by CodeBlock. - The behaviour is incorrect as in this case it should be including the triple backticks as part of its display (although I'd expect in >99% of cases the user would prefer standard CodeBlock behaviour and the LLM has made a mistake.)
In looking at this, I've bumped the marked library to 13.0.3 and then 14.0 (to see if this fix : https://github.com/markedjs/marked/pull/3264 would make a difference - it doesn't). This does change the interface a little but fairly easy to update.
In the meantime, adding this to ChatMessage returns the < to < and renders the fences, although doesn't benefit from the Syntax Highlighting and Copy to Clipboard button that <CodeBlock> provides.
renderer.code = (code) => {
return `<pre><code>${sanitizeMd(code.raw)}</code></pre>`;
}
@nsarrazin - let me know if you want this as a PR - I think this improves behaviour but isn't a proper fix. In the meantime, I'm going to tweak my System prompt to advise the LLM against using indents for Code Blocks, and ultimately think this is a defect in the marked library?
:(
:) with additional render hook
Leaving the previous comment, however I am not 100% confident on whether indents are the only cause after more testing. The output below uses <CodeBlock> from list entries 10 onward...
"updates": [
{
"type": "status",
"status": "started"
},
{
"type": "title",
"title": "💻 HTML tags"
},
{
"type": "finalAnswer",
"text": "Certainly! Here are some common HTML tags with a brief description:\n\n1. **`<html>`**: The root element of an HTML page. It encloses all the content of the page.\n\n ```html\n <html>\n </html>\n ```\n\n2. **`<head>`**: Contains meta-information about the HTML document, like title and links to scripts and stylesheets.\n\n ```html\n <head>\n </head>\n ```\n\n3. **`<title>`**: Defines the title of the document, shown in the browser's title bar or tab.\n\n ```html\n <title>Page Title</title>\n ```\n\n4. **`<body>`**: Contains the content of the document, such as text, images, and hyperlinks.\n\n ```html\n <body>\n </body>\n ```\n\n5. **`<h1>` to `<h6>`**: Header tags, `<h1>` is the highest level and `<h6>` is the lowest. They define headings.\n\n ```html\n <h1>Main Heading</h1>\n ```\n\n6. **`<p>`**: Represents a paragraph of text.\n\n ```html\n <p>This is a paragraph.</p>\n ```\n\n7. **`<a>`**: The anchor tag, used to create hyperlinks.\n\n ```html\n <a href=\"https://www.example.com\">Visit Example</a>\n ```\n\n8. **`<img>`**: Embeds an image in the document. It should include the `src` (source) attribute and often an `alt` (alternative text) attribute.\n\n ```html\n <img src=\"image.jpg\" alt=\"Description of image\">\n ```\n\n9. **`<ul>` / `<ol>`**: Unordered (bulleted) and ordered (numbered) list containers, respectively.\n\n ```html\n <ul>\n <li>List item 1</li>\n <li>List item 2</li>\n </ul>\n ```\n\n ```html\n <ol>\n <li>First item</li>\n <li>Second item</li>\n </ol>\n ```\n\n10. **`<li>`**: Represents a list item, used within `<ul>` or `<ol>`.\n\n ```html\n <li>A list item</li>\n ```\n\n11. **`<div>`**: A generic container for content, often used for styling or layout purposes.\n\n ```html\n <div>This is a division.</div>\n ```\n\n12. **`<span>`**: A generic inline container, typically used to apply styles or scripts.\n\n ```html\n <span style=\"color:blue\">This is a blue text.</span>\n ```\n\n13. **`<input>`**: Represents an input field in a form, where data can be entered.\n\n ```html\n <input type=\"text\" name=\"username\">\n ```\n\n14. **`<button>`**: Represents a clickable button.\n\n ```html\n <button>Click me</button>\n ```\n\nRemember, these are just foundational tags, and HTML supports many more elements you can learn about as you build more complex pages.",
"interrupted": false,
"usage": {
"input_tokens": 88,
"output_tokens": 691
}
}
],
Here is a snippet that shows the issue:
- https://gist.github.com/evalstate/6b5ca3f67634602f7ce8dd8c3dbab7a3
- Marked Demo
The handling of code blocks in lists changes; asking the LLM via Chat-UI to repeat all or part of the block verbatim shows the behaviour.
The GFM spec recommends using a blank HTML comment to disambiguate indented blocks: https://github.github.com/gfm/#example-288
## Inside a List
- This is a test (normal fences)
```html
<foo />
-
This is another test (indented block)
-
This is a further test (indents and fences)
<foo /> <bar /> -
Test complete
Outside a List
This is a test (normal fences)
<foo />
This is another test (indented block)
<foo />
<bar />
This is another test (indents and fences)
```
<foo />
<bar />
```
Test complete
Final update on this for the moment - the issue also occurs when code blocks are children of lists, causing the parse(token.raw) to show the child codeblock rather than being caught by the type==="code" clause here:
https://github.com/huggingface/chat-ui/blob/97b6feb8b9ed57148e76b11944ace966029ea108/src/lib/components/chat/ChatMessage.svelte#L267-L276
Can't see an obvious quick way to fix this.
Getting this issue with Qwen2.5-Coder-32B-Instruct:
The raw markdown looks like:
### Explanation of the Code
1. **Loop through each `char*` and delete it:**
```cpp
for (size_t i = 0; i < count; i++) {
delete suggestions[i];
suggestions[i] = 0;
}
Seems like the code block produced by Qwen is indented, which usually isn't common, but seems to be more common with this particular model.
It's because it's a child of a bulleted/numbered list. In this case it doesn't use the CodeBlock component but the marked output.
On Tue, 12 Nov 2024, 09:19 Rotem Dan, @.***> wrote:
Getting this issue with Qwen2.5-Coder-32B-Instruct:
Screenshot_1.png (view on web) https://github.com/user-attachments/assets/bcac2c50-e676-4a2c-9393-1a6aa60dffb1
The raw markdown looks like:
Explanation of the Code
- Loop through each
char*and delete it:for (size_t i = 0; i < count; i++) { delete suggestions[i]; suggestions[i] = 0; }Seems like the code block produced by Qwen is indented, which usually isn't common, but seems to be more common with this particular model.
— Reply to this email directly, view it on GitHub https://github.com/huggingface/chat-ui/issues/1337#issuecomment-2469995884, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOYXFQ6HIS3NX7Q73VCHWD2AHBZXAVCNFSM6AAAAABKVHNUMCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRZHE4TKOBYGQ . You are receiving this because you commented.Message ID: @.***>
Last reply not helpful - there are 2 separate issues:
- Code blocks that are children of lists don't get rendered via the CodeBlock component.
- Those code blocks render "<" symbols incorrectly.
I can produce a PR for the second issue (I fixed this in my fork but left it as it's not a "complete" fix).
Adding this to ChatMessage fixes the <'s.
renderer.code = (code) => {
return `<pre><code>${sanitizeMd(code.raw)}</code></pre>`;
}