Is it possible to convert only a list of tags and leave the rest as plain text
I looked through the documentation and examples but couldn't find anything about this. I want to convert a handful of tags (<p>, <li>, <a>) to markdown and the rest to plain text. I was wondering if there is a filtering mechanism where:
- I can specify the tags I want to convert to markdown
- I can provide a format for convert
<a>tag to (I want it to look liketext (link)
These are quite custom things and would suggest you to do a pre-processing step using HtmlAgilityPack to convert/process the required html nodes as per your requirements and then pass the resulting html for Markdown conversion.
If you look at the source code, you can learn how I am using HtmlAgilityPack internally.
Quick follow on note, I think there is room to extend PassThroughTags to render as text with an additional option rather html. Let me have a look and revert.
Thank you for the prompt response. I started looking at the source, and I think it may be simpler to create a CustomConverter (which doesn't exist) using the converters that you have already implemented. In that case, Instead of finding all IConverter implementations and adding them to _converters dictionary, I would just add the ones that I want to use. But there is a problem with this approach, all classes implementing IConverter require a Converter in their constructor. If it was instead an interface, I could implement that interface. The interface could look something like:
public interface ITopLevelConverter // I know, bad name :)
{
Config Config { get; };
string Convert(string html);
void Register(string tagName, IConverter converter);
IConverter Lookup(string tagName);
}
I could hack something together using your code:
public class CustomConverter : ReverseMarkdown.Converter
{
private readonly IDictionary<string, IConverter> _converters = new Dictionary<string, IConverter>();
private readonly IConverter _innerTextConverter;
public CustomConverter()
{
_converters["p"] = new P(this);
_converters["li"] = new Li(this);
_converters["ol"] = new Ol(this);
_innerTextConverter = new InnerText(this);
}
public new string Convert(string html)
{
html = ReverseMarkdown.Cleaner.PreTidy(html, Config.RemoveComments);
var doc = new HtmlDocument();
doc.LoadHtml(html);
var root = doc.DocumentNode;
// ensure to start from body and ignore head etc
if (root.Descendants("body").Any())
{
root = root.SelectSingleNode("//body");
}
var result = Lookup(root.Name).Convert(root);
return result.Trim();
}
public new IConverter Lookup(string tagName)
{
return _converters.ContainsKey(tagName) ? _converters[tagName] : _innerTextConverter;
}
}
As you can see this is not ideal (due to hiding members of the base class), but it seems to work. Do you think this would be an extension vector for the library? (BTW: Since this now works for me, I don't really need this to be implemented in the library.)