reversemarkdown-net icon indicating copy to clipboard operation
reversemarkdown-net copied to clipboard

Request: Add a config to avoid rendering images as base 64

Open alkampfergit opened this issue 11 months ago • 3 comments

I have html files that contains inline image data, when I convert to Markdown I got the image embedded in Base64, I'd like an option to avoid this. This is a problem because the resulting markdown is really big.

An option that prevent this would be nice.

image

alkampfergit avatar Dec 18 '24 11:12 alkampfergit

Acknowledge seeing this, will see what best we can do. I am currently traveling so will be able to revert on this only in couple of days.

mysticmind avatar Dec 18 '24 12:12 mysticmind

In the interim, you can do pre-processing of your HTML content using HtmlAgilityPack as below to remove that img elements:

string html = @"
    <html>
        <body>
            <img src='' />
            <img src='https://example.com/image.jpg' />
        </body>
    </html>";

// Load HTML document
HtmlDocument document = new HtmlDocument();
document.LoadHtml(html);

// Select all <img> tags
var imageNodes = document.DocumentNode.SelectNodes("//img");

if (imageNodes != null)
{
    foreach (var img in imageNodes)
    {
        string src = img.GetAttributeValue("src", string.Empty);

        if (src.StartsWith("data:image/"))
        {
            // Remove the <img> node from the HTML
            img.Remove();
        }
    }
}

// Save or display the cleaned HTML
string cleanedHtml = document.DocumentNode.OuterHtml;
Console.WriteLine(cleanedHtml);

mysticmind avatar Dec 18 '24 12:12 mysticmind

Actually is my actual solution :) preprocessing with HtmlAgilityToolkit and then converting to markdown :).

Thanks.

alkampfergit avatar Dec 18 '24 13:12 alkampfergit