hiccup icon indicating copy to clipboard operation
hiccup copied to clipboard

Data convention for raw string representation

Open borkdude opened this issue 7 months ago • 10 comments

In https://github.com/nextjournal/markdown we are generating a Markdown string -> AST -> Hiccup representation. Markdown HTML blocks and inline tags are raw HTML and we can't represent those as raw strings in hiccup in a library-independent way. E.g.:

# Hello

<img src="foo.jpg"/>

is rendered to something like:

[:div 
  [:h1 "Hello"]
  [??? "<img src=\"foo.jpg\"/>]

where the image is raw HTML. Preferably we don't want to directly couple the library to a specific hiccup library. Calling hiccup2/raw to represent the raw node would couple the library to this project. Clojure libraries that can communicate via data should maybe not force specific dependencies on people if it can be avoided. Another issue is platform independence. https://github.com/nextjournal/markdown works on both the JVM and in ClojureScript. For ClojureScript we don't know if the data is going to be rendered with Reagent or some other library. So I'm going to ping a couple of alternative hiccup library authors here as well.

It would be nice to have an implementation-agnostic data-centered way to represent raw nodes.

In https://github.com/borkdude/html I had chosen the [:$ "<img src=\"...\"/>] tag for this since :$ looks as if it cringes on the unescaped HTML. @weavejester suggested on Slack to use [:= ... instead to indicate that this tag should set the literal HTML provided. I'm fine with that. Hopefully that does not clash with some template language that looks like HTML that people use as a target. Another idea would be to use something like [:div {:dangerouslySetInnerHTML "..."] (React) but I'm not sure if looking at properties when generating HTML has a significant performance penalty.

Pinging @weavejester (hiccup), @escherize (huff), @Deraen (reagent), @lilactown (Helix). Not sure if it makes sense to ping more people, if we can make this a standard between all of these libraries, I think it's a standard ;-).

For the safety aspect of this feature, see https://github.com/escherize/huff/issues/5#issuecomment-2892156583.

borkdude avatar May 19 '25 20:05 borkdude

For Reagent, I likely wouldn't add a such feature. Reagent now has a special handling for React dangerouslySetInnerHTML property that only allows values tagged with a specific type. This is to prevent values from EDN and Transit data from being able to include these properties, even if they were parsed to React elements: https://github.com/reagent-project/reagent/blob/master/doc/Security.md#hiccup-data

Note: the comments in huff issue about function call not being more secure than Hiccup data doesn't hold for Reagent: with Reagent and React, dangerouslySetInnerHTML property is the only way to add JS code to React elements, React doesn't allow JS in any other places (like script tags or onLoad etc. attributes.)

Deraen avatar May 20 '25 07:05 Deraen

@Deraen:

Note: the comments in huff issue about function call not being more secure than Hiccup data doesn't hold for Reagent: with Reagent and React, dangerouslySetInnerHTML property is the only way to add JS code to React elements, React doesn't allow JS in any other places (like script tags or onLoad etc. attributes.)

Thanks for the clarification. Let me make sure I understand:

  1. The function approach is more secure if it's the only way to inject JS and cannot be bypassed with any other EDN, which is true for Reagent/React with dangerouslySetInnerHTML. Since Huff allows :script contents and JS event attributes as strings, it's possible to bypass the function, so it doesn't add any real security.
  2. In both Reagent/React and Huff, it's still possible for attackers do bad things (such as adding :style attributes) when rendering untrusted EDN, so it's necessary to sanitize before rendering regardless.

Is that right?

rads avatar May 20 '25 08:05 rads

Is the ask to support a [:= "<img src=\"...\"/>] tag? With huff, as of 0.2.13 you can add that functionality from userspace! I was just starting to think that no-one would ever need to use the custom extensions. :) Here's the test that shows how it's done.

Then again if we're making it a standard, I guess we could add that as part of the "hiccup spec" and just accept :='s by default.

How does that sound?

escherize avatar May 20 '25 21:05 escherize

I think that's good, as it means there will be a common baseline for handling raw strings. I was also thinking a formal Hiccup specification document would be a good idea.

weavejester avatar May 20 '25 21:05 weavejester

Found another hiccup here: https://github.com/onionpancakes/chassis so I'm pinging @onionpancakes too.

borkdude avatar May 27 '25 12:05 borkdude

I'm hesitant to overload global keywords for non-elements as a default, so I'm going to stay on the hammock for now. Currently in chassis, users can override a protocol to implement this feature for themselves.

onionpancakes avatar May 29 '25 07:05 onionpancakes

users can override a protocol to implement this feature for themselves

Just to re-iterate, the idea is to have a non-library-specific way of doing this so other libraries can produce standard hiccup without tying it to one specific implementation. So a library-specific function or protocol wouldn't solve that problem. I guess the library producing the hiccup could advice people to do a post-walk on the hiccup or have some configuration so produce the raw element, as an alternative solution.

borkdude avatar May 29 '25 08:05 borkdude

Here's another proposal which should be less controversial.

Replicant (@cjohansen's project) has a way of registering custom tags, something like:

(require '[replicant.alias :as ra])
(ra/register :foo/bar ...)

Then, when rendering hiccup, the [:foo/bar ...] element dispatches to a function that was registered. To solve the problem experienced in this issue, users could register [:raw/html ] or so to produce raw HTML using

(hiccup2/register-alias! :raw/html hiccup2/raw-string)

The nextjournal.markdown library could then produce [:raw/html ..] tags without depending on a specific library.

borkdude avatar Jun 18 '25 09:06 borkdude

Was the original proposal controversial?

Many libraries allow custom behavior, but isn't the point to decide upon a standard?

weavejester avatar Jun 18 '25 13:06 weavejester

You're right, without a standard you either need to configure the hiccup library or the library producing the hiccup (which is the case now with nextjournal.markdown), so making it configurable in the hiccup library is just moving the problem around. Standard we need.

borkdude avatar Jun 19 '25 08:06 borkdude