turndown
turndown copied to clipboard
Custom inline element causing whitespace issues
Found this just today and immediately wanted to use it for a project involving converting SSML (speech synthesis markup language) to markdown.
I got a boilerplate working, but when I added one of my rules I kept getting extra whitespace (code below)
const TurndownService = require('turndown');
var turndownService = new TurndownService()
turndownService.addRule('ssml', {
filter: ['speak'],
replacement: function (content) {
return content;
}
});
turndownService.addRule('sentence', {
filter: ['s'],
replacement: function (content) {
return content;
}
});
turndownService.addRule('pause', {
filter: ['break'],
replacement: function (content) {
return ',' + content;
}
});
var markdown = turndownService.turndown('<speak><s>Hello<break /> world!</s></speak>');
console.log(markdown);
This outputs:
Hello ,world!
Instead of the expected:
Hello, world!
Debugging the code I can only think the parser is keeping a blank space where the element was and trimming the line up to the next work instead of preserving the blank space.
For now I'm doing a dirty hack of string replacing the commas with the space (SSML doesn't use comma characters) and flipping it.
Any suggestions on a cleaner solution, especially if I end up in the same situation with another custom element? (hoping to make a plugin so want everything to be clean as possible).
I've not heard of SSML, so this looks interesting! I'm not quite sure what the issue is, but I have an idea, which I'll jot down below:
During the conversion process, Turndown examines the element as well as the surrounding elements to check for whitespace, and then adds any if necessary. This is to ensure that an input of Hello<em> world</em> gets converted to Hello _world_ rather than Hello_ world_ which is not valid markdown :/ Part of the checks involve determining whether the surrounding elements are block level or not. Given that these are not defined as block or not, this might be causing some confusion.
https://github.com/domchristie/turndown/blob/2063e423a565f588f67cea8b68a002844bf08e33/src/node.js#L20-L60
Cool, yeh I assumed the type of block was inferred from the declaration but does make more sense to set them specifically to avoid users being able to use wrong syntax.
I did add the new elements to the relevant blockElements and voidElements arrays by directly editing the source code (did a console.log to make sure it was running right file) but it didn't change the output.
Guess I need to dig deeper.
Also just to add to this on a sidenote I went ahead with creating a plugin and was easier than I though to make.
https://github.com/Truemedia/turndown-ssml
It only covers the core subset of SSML but this already means I can use this to handle all the SSML templates I have written to date.