mdex RFC - Extension API

Allow users to inject functions into the pipeline to transform Markdown and HTML.

Suppose you want to transform all headings into H1 and add a class topic to each one of those headers. You could transform the Markdown to inject HTML or transform the generated HTML to transform the h tags, ie: do all the transformations in the Markdown phase or in the HTML phase, which is not ideal because each phase has its own rules and semantics. So we want to provide an unified API where you can inject transformation functions into each phase that makes more sense. In this example the API would look like:

markdown = """
# Get Started

## Install
"""

update_headings_to_level_1 = fn pipeline ->
  tree =
    MDEx.find_and_update(pipeline.tree, "heading", fn ->
      {"heading", [{"level", _}]} ->
        {"heading", [{"level", 1}]
      other ->
        other
    end)

  %{pipeline | tree: tree}
end

set_topic_class_h1 = fn pipeline ->
  tree =
    MDEx.find_and_update(pipeline.tree, "h1", fn ->
      {"h1", _}_ ->
        {"h1", [{"class", "topic"}]
      other ->
        other
    end)

  %{pipeline | tree: tree}
end

MDEx.new(markdown: markdown)
|> MDEx.append_md_steps(
  update_headings_to_level_1: &update_headings_to_level_1/1
)
|> MDEx.append_html_steps(
  set_topic_class_h1: &set_topic_class_h1/1
)

Executing this pipeline results in:

MDEx.run(pipeline)

# <h1 class="topic">Get Started</h1> 
# <h1 class="topic">Install</h1>

If you're familiar with Floki, Req, and MDX you'll feel at home.

On MDX you can add plugins into the Markdown phase (remark plugins) or into the HTML phase (rehype plugins), the idea is the same but using the Req style of manipulating the pipeline with functions, and the AST format comes from Floki so we can have an unified API for both Markdown and HTML.

Names are subject to change.

Apr 23 '24 14:04 leandrocp

I'd say an implementation akin to how Plug and other composable pipeline "things" in elixir work, where you either provide a function that accepts and returns an object, or a module that implements a Behavior with a callback or two.

A stab at the loose behaviour could be something akin to

defmodule MDEx.Extension do
  @doc "Transforms that occur on the markdown phase"
  @callback pre(pipeline :: term()) :: {:ok, term()} | {:error, term()}
  @doc "Transforms that occur on the HTML phase"
  @callback post(pipeline :: term()) :: {:ok, term()} | {:error, term()}

  defmacro __using__() do
    quote location: keep do
      @behaviour MDEx.Extension

      @impl MDEx.Extension
      def pre(pipeline), do: {:ok, pipeline}

      @impl MDEx.Extension
      def post(pipeline), do: {:ok, pipeline}

      defoverridable MDEx.Extension
  end
end

You'd then use it like such

defmodule MyExtension do
  use MDEx.Extension

  @impl MDEx.Extension
  def pre(%{tree: tree} = pipeline) do
    tree
    |> MDEx.find_and_update("heading", fn ->
      {"heading", [{"level", _}]} ->
        {"heading", [{"level", 1}]
      other ->
        other
    end)
    |> then(&{:ok, %{pipeline | tree: &1}})
  end

  @impl MDEx.Extension
  def post(%{tree: tree} = pipeline) do
    tree
    |> MDEx.find_and_update(pipeline.tree, "h1", fn ->
      {"h1", _}_ ->
        {"h1", [{"class", "topic"}]
      other ->
        other
    end)
    |> then(&{:ok, %{pipeline | tree: &1}})
  end
end

And register it with MDEx in a manner akin to

MDEx.new(markdown: markdown)
|> MDEx.append_extension(MyExtension)

Note that if you pass a function, instead of a module, it would just call the function. There would need to be a way to tag what step this applies to with single functions.

I'll be keeping an eye on this, as I'll probably want to make a clone of the functionality in my djot repo

Aug 23 '24 20:08 paradox460

Hey @paradox460 thanks for sharing your thoughts! I've been thinking about this API and what I currently have in mind is similar to your proposal but using plain functions instead of a module. The design is actually very similar to Req.Request - I won't say it's identical because it does have a few fundamental differences, the main one is that we have parse and format steps as opposed to request and response. Parse receives the Markdown AST and must return a transformed AST at the end of the pipeline, while format is used to output such AST to a friendly format as HTML, XML, LiveView, and others. That means we can't assume the output is always HTML. So it's definitely based on Req's API, which has the huge benefit of being an API that people are used to work with, the barrier to write and using plugins in that format is lower. I talked to Wojtek about reusing his code and he was super kind to allow it and also supportive with the idea.

Rendering a markdown to HTML with Mermaid graphs would look like:

html =
  MDEx.new(
    markdown: """
    graph TD;
      A-->B;
    """,
    extension: [autolink: true]
    # may pass other options from https://hexdocs.pm/mdex/0.1.18/MDEx.html#to_html/2-options
    # probably need to register those options too
  )
  |> MDEx.Mermaid.attach() # from package :mdex_mermaid to be created yet
  |> MDEx.HTML.attach()

IO.puts(html)

And the plugins:

defmodule MDEx.Mermaid do
  @moduledoc """
  Inject Mermaid JS and renders mermaid code blocks
  """
  
  @required_opts [
    render: [unsafe_: true],
    features: [sanitize: false]
  ]
  
  def attach(%MDEx.Pipe{} = pipe, opts \\ []) do
    pipe
    |> MDEx.Pipe.register_options([:mermaid_version])
    |> MDEx.Pipe.merge_options(opts)
    |> MDEx.Pipe.merge_options(@required_opts) # still not sure the best approach to handle required opts
    |> MDEx.Pipe.append_parse_steps(load_mermaid: &load_mermaid/1)
  end

  defp load_mermaid(parse) do
    # pretty much the same code as https://github.com/leandrocp/mdex/blob/5987418685e87f7ef85babd945416306b56c6536/examples/mermaid.exs#L40-L60 but with a couple changes:
    # Use `options[:mermaid_version]` to load specific version or defaults to latest
    # Only transform `code_blocks` where literal == "mermaid"

    # ... return transformed AST
  end
end

defmodule MDEx.HTML do
  @moduledoc """
  Render Markdown AST as HTML
  """
  
  def attach(%MDEx.Pipe{} = pipe, opts \\ []) do
    pipe
    |> MDEx.Pipe.append_format_steps(to_html: &to_html/1)
  end
  
  defp to_html(pipe) do
    Map.put(pipe, :output, MDEx.to_html(pipe.parse, pipe.options)
  end
end

And the struct holding everything together:

defmodule MDEx.Pipe do
  defstruct [:options, :parse_steps, :format_steps, :output, :halted, :private]
end

I'm not so sure about the name %MDex.Pipe tho, it feels too generic.

Sep 03 '24 14:09 leandrocp

I’m currently playing with this and I want to modify the approach in mermaid.exs to only inject the javascript on pages where mermaid is present. This isn't possible with the current implementation of MDEx.traverse_and_update/2, but it would be useful if the extension API and/or traverse_and_update/2 could update the options or some other state (private?) so that decisions like this could be made based on what is present, rather than what is assumed that might be present.

The above is correct for MDEx.traverse_and_update/2, but MDEx.Traversal.traverse_and_update/3 will do this, although it is not documented and not exposed from the top level.

Nov 29 '24 02:11 halostatue

@halostatue I'm almost done with some improvements on the API that will expose traverse_and_update with an accumulator (acc argument) and also an implementation of Access and Enumerable protocols to let you query/search the AST more easily.

Nov 29 '24 03:11 leandrocp

This seems like something I could use as well.

Feb 23 '25 08:02 jdmarshall