Prevent source content from getting entirely replaced by CKEditor5's normalized HTML output?
I'm looking for advice on getting around the fact that CKEditor5 always normalizes source data.
The docs say that is core behavior that cannot be changed (link), but this is a big problem if the user or developer cares about the source, for example using CKE5 to edit pre-existing HTML or markdown content. My own use case is a VS Code extension for viewing and editing .md files.
Is there any guidance for use cases like this?
The problems are...
- All non-renderable content is lost including
<!-- comment -->,<script>,<meta>, and<style> - Syntax gets changed. eg Tag
<em>becomes<i>. Markdown bullets-become* - All formatting and indentation gets lost
Is there any way CKEditor5 could be modified to optionally only normalize the chunks of source that were modified instead of the entire thing? If I was to solve this without rewriting the CKEditor5 internals I think I would have to do something like...
- Run the CKE normalization on both the original source and editor output
- Perform a diff between them to identify blocks that actually changed
- Merge those changes back into the original source
But that seems like a fragile solution, so I'm hoping for feedback.
Hi! This is very unlikely to happen I think, and creating something like this would be a huge task. Everything that editor parses is represented in an internal model structure. We don't have em or i we have attribute on a text node with italic. All of the features operate on this abstruct structure, and the output is just translating it to a desired format.
- Run the CKE normalization on both the original source and editor output
- Perform a diff between them to identify blocks that actually changed
- Merge those changes back into the original source
This could be one of the solutions, but would make the getData operation even more heavy than it is today. Creating a diffing and merging heuristics would also be challenging for sure.
- All non-renderable content is lost including
<!-- comment -->,<script>,<meta>, and<style>
Have you tried features like HTML Comments, or Full Page? I'm not sure how would they behave with the markdown output TBH.
- Syntax gets changed. eg Tag
<em>becomes<i>. Markdown bullets-become*- All formatting and indentation gets lost
Is it the case of always outputting what was inputted, or the matter of preferences? Both editor API and markdown output could be configured in some way.
Is it the case of always outputting what was inputted, or the matter of preferences?
The former. For example when using my plugin to just fix a single typo in a README.md, the entire file gets modified in unrelated/destructive ways. Changing - to *, removal of comments, and autoformatting are examples of that.
Have you tried features like HTML Comments, or Full Page? I'm not sure how would they behave with the markdown output TBH.
Markdown comments still get lost with HTML Comments, and Full Page breaks the rendering causing the entire source to render as a single paragraph element. The GeneralHtmlSupport feature also seems relevant.
This could be one of the solutions, but would make the getData operation even more heavy than it is today. Creating a diffing and merging heuristics would also be challenging for sure.
After some more thought it seems like solutions would fall into these categories:
- Modify CKE5 fundamentally to render from raw source instead of maintaining its own internal structure.
- I'm assuming this is not viable
- Let CKE5 normalize the entire output, and only merge minimum required parts of that back into the input.
- My diffing idea is this
- Make CKE5's internal model support all HTML/markdown blocks and properties, so the conversion into CKE5s internal model and back is not lossy.
- From what I can tell it looks like that's how the Full Page and HTML Comment plugins work
And I'm thinking it might make more sense to try improving on the last category instead of the diffing? I had some questions about this...
-
Could the problem of losing source formatting be solved by checking the leading space in front of each element when parsing input to create the internal model, and then just storing it as a property on the model node similar to how element properties like
id,class, anddata-*get saved underhtmlPAttributesin the Model when theGeneralHtmlSupportplugin is enabled? -
Could you similarly store details like original tag type as an attribute? So
<em>and<i>would both still become<paragraph>with theitalic: trueattribute, but also have another attribute liketag: "em". -
How would you approach getting this to work with markdown as well as HTML?
-
I noticed that the
HTML CommentsandFull Pageplugins encode the contents and positions of comments and non-renderable HTML elements into the root element. I'm curious why was that method chosen instead of just adding invisible<comment>or<meta>nodes stored in the Model tree alongside other nodes like<paragraph>?
hi @adamerose
I just ran into my issue when try HTML Comment and Full Page. My site uses CKEditor to prepare email content and send it. My email content contains both tags and HTML Comments.
When FullPage is enabled, tag will be retained. When HTML Comment is enabled, HTML Comment will be retained. But when you combine these two plugins, only the tag is retained. I have not found any solution to handle this case.
@ducviethaboo Consider reordering plugins. Make sure that HTMLComment plugin is added after FullPage.