reversemarkdown-net icon indicating copy to clipboard operation
reversemarkdown-net copied to clipboard

Is it possible to convert only a list of tags and leave the rest as plain text

Open sikri-eic opened this issue 3 years ago • 4 comments

I looked through the documentation and examples but couldn't find anything about this. I want to convert a handful of tags (<p>, <li>, <a>) to markdown and the rest to plain text. I was wondering if there is a filtering mechanism where:

  • I can specify the tags I want to convert to markdown
  • I can provide a format for convert <a> tag to (I want it to look like text (link)

sikri-eic avatar Feb 21 '22 07:02 sikri-eic

These are quite custom things and would suggest you to do a pre-processing step using HtmlAgilityPack to convert/process the required html nodes as per your requirements and then pass the resulting html for Markdown conversion.

If you look at the source code, you can learn how I am using HtmlAgilityPack internally.

mysticmind avatar Feb 21 '22 07:02 mysticmind

Quick follow on note, I think there is room to extend PassThroughTags to render as text with an additional option rather html. Let me have a look and revert.

mysticmind avatar Feb 21 '22 08:02 mysticmind

Thank you for the prompt response. I started looking at the source, and I think it may be simpler to create a CustomConverter (which doesn't exist) using the converters that you have already implemented. In that case, Instead of finding all IConverter implementations and adding them to _converters dictionary, I would just add the ones that I want to use. But there is a problem with this approach, all classes implementing IConverter require a Converter in their constructor. If it was instead an interface, I could implement that interface. The interface could look something like:

public interface ITopLevelConverter // I know, bad name :)
{
	Config Config { get; };
	string Convert(string html);
	void Register(string tagName, IConverter converter);
	IConverter Lookup(string tagName);
}

sikri-eic avatar Feb 21 '22 08:02 sikri-eic

I could hack something together using your code:

public class CustomConverter : ReverseMarkdown.Converter
{
	private readonly IDictionary<string, IConverter> _converters = new Dictionary<string, IConverter>();
	private readonly IConverter _innerTextConverter;
	public CustomConverter()
	{
		_converters["p"] = new P(this);
		_converters["li"] = new Li(this);
		_converters["ol"] = new Ol(this);

		_innerTextConverter = new InnerText(this);
	}

	public new string Convert(string html)
	{
		html = ReverseMarkdown.Cleaner.PreTidy(html, Config.RemoveComments);

		var doc = new HtmlDocument();
		doc.LoadHtml(html);

		var root = doc.DocumentNode;

		// ensure to start from body and ignore head etc
		if (root.Descendants("body").Any())
		{
			root = root.SelectSingleNode("//body");
		}

		var result = Lookup(root.Name).Convert(root);

		return result.Trim();
	}

	public new IConverter Lookup(string tagName)
	{
		return _converters.ContainsKey(tagName) ? _converters[tagName] : _innerTextConverter;
	}
}

As you can see this is not ideal (due to hiding members of the base class), but it seems to work. Do you think this would be an extension vector for the library? (BTW: Since this now works for me, I don't really need this to be implemented in the library.)

sikri-eic avatar Feb 21 '22 08:02 sikri-eic