reversemarkdown-net
reversemarkdown-net copied to clipboard
Request: Add a config to avoid rendering images as base 64
I have html files that contains inline image data, when I convert to Markdown I got the image embedded in Base64, I'd like an option to avoid this. This is a problem because the resulting markdown is really big.
An option that prevent this would be nice.
Acknowledge seeing this, will see what best we can do. I am currently traveling so will be able to revert on this only in couple of days.
In the interim, you can do pre-processing of your HTML content using HtmlAgilityPack as below to remove that img elements:
string html = @"
<html>
<body>
<img src='data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA' />
<img src='https://example.com/image.jpg' />
</body>
</html>";
// Load HTML document
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);
// Select all <img> tags
var imageNodes = document.DocumentNode.SelectNodes("//img");
if (imageNodes != null)
{
foreach (var img in imageNodes)
{
string src = img.GetAttributeValue("src", string.Empty);
if (src.StartsWith("data:image/"))
{
// Remove the <img> node from the HTML
img.Remove();
}
}
}
// Save or display the cleaned HTML
string cleanedHtml = document.DocumentNode.OuterHtml;
Console.WriteLine(cleanedHtml);
Actually is my actual solution :) preprocessing with HtmlAgilityToolkit and then converting to markdown :).
Thanks.