hickory icon indicating copy to clipboard operation
hickory copied to clipboard

Compatible with nodejs

Open njordhov opened this issue 8 years ago • 28 comments

Issue #17 reports an incompatibility with nodejs where loading fails due to missing DOM bindings.

I have resolved this issue by replacingjs/Node with goog.dom/NodeType and made js/NodeList optional, which eliminates the dependency on browser dom, allowing hickory to load also on nodejs.

njordhov avatar Mar 01 '16 00:03 njordhov

Interesting!

Is it only about loading or can you parse html in node too?

jeluard avatar Mar 02 '16 22:03 jeluard

It's primarily to complete the loading, ready for a polyfill that makes hickory able to parse html in nodejs.

njordhov avatar Mar 03 '16 06:03 njordhov

Is this something that is useful by itself, if you still can't parse the HTML?

davidsantiago avatar Mar 03 '16 18:03 davidsantiago

With the commit hickory.core/parse works in nodejs after setting js/DOMParser to one of the available DOM parsers available for node. I currently use xmldom:

(set! js/DOMParser (.-DOMParser (nodejs/require "xmldom")))

For full use of hickory I also have a hairy polyfill to accommodate that hickory extends NodeList instead of using standard dom to iterate over childnodes and attributes. A minor additional change to hickory would eliminate the need for this patch.

But making hickory loadable on nodejs has value in itself, as it doesn't become a showstopper for using other modules that depends on hickory. Case in point is kioo.

njordhov avatar Mar 03 '16 19:03 njordhov

Ok, I don't understand the nodelist issue (or most of what you said really), but I think we don't want to stop anyone's show. Can you explain what the further fix that obviates this patch is?

davidsantiago avatar Mar 03 '16 21:03 davidsantiago

Sorry for not explaining more clearly. The problem for compatibility with node is that hickory.core (cljs) assumes that there is a js/NodeList that can be made seqable using extend-type-with-seqable. However, js/NodeList is not available on NodeJS.

The quick fix is a wrapper that coerces all dom nodelists into a sequence, for example using a function like:

(defn- as-seq [nodelist]                                                                                                              
  (if (seq? nodelist) nodelist (array-seq nodelist)))                                                                                 

Then use as-seq to wrap accessors for the "childNodes" and "attributes" properties to ensure they always provide a sequence that can be used with map:

(as-seq (aget this "childNodes"))
(as-seq (aget this "attributes"))

Or perhaps as-seq should just integrate the aget? Your choice.

njordhov avatar Mar 05 '16 04:03 njordhov

With these commits, hickory can be used on nodejs to parse markup into hickory or hiccup format, after setting js/DOMParser to one of the available DOM parsers available for node, such as:

(set! js/DOMParser (.-DOMParser (nodejs/require "xmldom")))

njordhov avatar Mar 05 '16 04:03 njordhov

Can we not simply use array-seq everywhere? Not needs to go through extend-type if we can avoid it.

jeluard avatar Mar 05 '16 08:03 jeluard

We can indeed simply use array-seq everywhere and eliminate extend-type. I wrote the commit with the intent to maintain compatibility in case any uses had come to depend on hickory extending js/NodeLists to become seqable, including cases where the NodeList of the dom nodes from parse already is seqable. Note that using array-seq on an object that already is a seq may lead to unexpected results so better test for seq? before using array-seq:

(array-seq (array-seq (.-childNodes (parse "<p>abc<br>def</p>"))))
=> nil  

njordhov avatar Mar 06 '16 00:03 njordhov

I don't think we need to keep this compatibility. I don't remember exactly why I choose this option when porting to ClojureScript but in retrospect it looks unnecessarily complex.

including cases where the NodeList of the dom nodes from parse already is seqable

Can this happen?

My opinion is we should just go for the simple case. Do you mind modifying the PR to achieve that? Appart from this I think it makes sense to merge then.

jeluard avatar Mar 20 '16 08:03 jeluard

Can this happen?

Yes, nodes from parse may already be seqable if some other module extends js/NodeList to be seqable. So it makes sense to test for seq? before using array-seq on nodelists.

I'll modify the PR for the simple case, eliminating the extend-type on js/NodeList.

njordhov avatar Mar 20 '16 21:03 njordhov

Right that may happen. I do think it is considered bad practice to do this. Harmless to add a seq? check anyway.

jeluard avatar Mar 21 '16 06:03 jeluard

I'd expect the seq? check to be harmless and have insignificant impact on performance. An alternative though is to use DOM to iterate over the child nodes instead of mapping over coerced nodelists.

njordhov avatar Mar 21 '16 21:03 njordhov

Extending js/NodeList to be seqable is practiced in the wild, e.g. in LightTable and Domina. So it makes sense to guard by checking for seq?.

njordhov avatar Mar 21 '16 21:03 njordhov

Good catch!

Looks great now, probably ready to be merged. @davidsantiago what do you think?

jeluard avatar Mar 21 '16 21:03 jeluard

I'm fine with an API break and issuing a new minor version, if that's the right thing to do. I'd rather just solve it and avoid any associated complexity. If you're happy with the patch, I'd just like someone to give me a little snippet for the readme change notes so I can be sure I am explaining what changes.

davidsantiago avatar Mar 22 '16 00:03 davidsantiago

There's no real API breakage beyond the js/NodeList not being extended anymore but that was an implementation detail never intended to be used outside hickory.

Tentative change notes:

  • fixed an issue where text nodes were parsed twice under some circumstances
  • hickory can now be used with nodejs (see README for configuration details)
  • ! js/NodeList are not extended anymore so iterating over those nodes will require usage of array-seq

jeluard avatar Mar 22 '16 06:03 jeluard

fixed an issue where text nodes were parsed twice under some circumstances

More accurate to say: text nodes were repeated as the issue could cause text nodes to be parsed and repeated multiple times.

are not extended anymore

Consider instead saying no longer extended to be seqable.

njordhov avatar Mar 22 '16 21:03 njordhov

I am now looking into node more for Macchiato. What's the status of this PR?

ricardojmendez avatar Dec 14 '16 06:12 ricardojmendez

Yep, sorry about that, this one fell off my radar as other things have been placed above it on my stack. I'll try to find some time tonight to get this merged in.

davidsantiago avatar Dec 14 '16 22:12 davidsantiago

Is there help needed to get this released? nodejs support would be greatly appreciated.

michihuber avatar Mar 08 '17 10:03 michihuber

Yes, very much. I don't know javascript or node or clojurescript really, and there are a number of issues and pull requests pending that I simply don't understand. There has been some discussion and debate in some of them that I don't understand. And just in a more general sense I don't understand what node support entails or what's missing or when it would be done. I have been meaning to do some studying and figure out what is going on in them, but I simply don't have the time to get up to speed for various life reasons.

If you can understand what is going on in pending node PRs and issues and figure out what is needed and what is ready to merge in, I'm going to defer to anyone who knows what they are doing on this environment and can help me figure out what to put in, what changes to make, and what to leave out (if anything).

davidsantiago avatar Mar 08 '17 18:03 davidsantiago

I would love it to get this working. I actually have this code: `(ns nodetest.core (:require [cljs.nodejs :as nodejs] [hickory.core :as h]))

(nodejs/enable-util-print!)

(set! js/DOMParser (.-DOMParser (cljs.nodejs/require "xmldom")))

(defn -main [] (println (h/as-hiccup (h/parse "<a href="foo">foo <a id="so" href="bar">bar<script src="blah.js">alert("hi");"))))

(set! main-cli-fn -main)`

But when I run this node target/server_dev/nodetest.js I get the following error: `/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1088 :else (throw (js/Error. (str coll " is not ISeqable")))))) ^ Error: false is not ISeqable at Object.cljs$core$seq [as seq] (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1088:20)

at cljs$core$empty_QMARK_ (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1884:20)

at hickory$core$format_doctype (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:55:15)

at hickory.core.as_hiccup.object (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:66:37)

at hickory$core$as_hiccup (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:11:1)

at /home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:4215:16
at cljs.core.map.cljs$core$IFn$_invoke$arity$2 (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:4215:15)

at cljs.core.LazySeq.sval (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:3010:18)

at cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:3052:12)

at Object.cljs$core$seq [as seq] (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1075:25)

` I am new to Clojure and don't have the best debug workflow at this time, so help is always appreciated.

Wanderwichtl avatar Jul 09 '17 20:07 Wanderwichtl

Likely xmldom requires well-formed XML markup, with each start tag having a matching end tag:

"<a href='foo'>foo</a> <a id='so' href='bar'>bar</a> <script src='blah.js'>alert('hi');</script>"

If it still fails, I suggest you trim the markup down to a single element:

"<a href='foo'>foo</a>"

As you're new to Clojure, have you actually gone through the extra steps of installing the "node" branch of Hickory rather than just using a dependency on the stock version that doesn't yet support node? If so, I award you the coveted lein install badge, please use your new powers with care.

njordhov avatar Jul 11 '17 21:07 njordhov

Very nice patch!

I got https://github.com/mpcarolin/markdown-to-hiccup working in nodejs (via lumo 1.8) by using this fork, installing jsdom via npm, and finally "polyfilling" something like this:

(let [jsdom (cljs.nodejs/require "jsdom")]
    (set! (. goog.global -Node) goog.dom.NodeType)
    (set! (. goog.global -document) (.-document (.-window (jsdom.JSDOM.)))))

(markdown-to-hiccup.core/md->hiccup "*hello*")
=> [:div {} [:p {:style {:margin 0}} [:em {} hello]]]

grav avatar Jul 05 '18 22:07 grav

Hi folks,

I'm trying to get Hickory to work on NodeJS courtesy of this PR/branch..

However, with the following code (at the repl):

#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])

(set! js/XMLHttpRequest (cljs.nodejs/require "xhr2"))
(set! (. goog.global -Node) goog.dom.NodeType)
(set! (. goog.global -NodeList) goog.dom.NodeList)
(def jsdom (cljs.nodejs/require "jsdom"))
(set! (. goog.global -document) (.-document (.-window (jsdom.JSDOM.))))

(async/go (let [response (async/<! (http/get "https://www.google.co.uk"))]
  (h/as-hickory (h/parse (:body response)))))

I get the following error:

Error: No matching clause:
    at /mnt/d/Projects/hickory-experiment/.cljs_node_repl/hickory/core.cljs:89:22
    at m__4244__auto__ (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/hickory/core.cljs:19:1)
    at f (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:4680:16)
    at cljs.core.map.cljs$core$IFn$_invoke$arity$2 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.js:17602:3)
    at cljs.core.LazySeq.fn [as sval] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3394:18)
    at cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3384:1)
    at Object.coll [as seq] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:1210:25)
    at Function.cljs.core.seq_reduce.cljs$core$IFn$_invoke$arity$3 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:2445:26)
    at cljs.core.LazySeq.cljs.core.seq_reduce.cljs$core$IFn$_invoke$arity$3 [as cljs$core$IReduce$_reduce$arity$3] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3458:28)
    at Function.coll [as cljs$core$IFn$_invoke$arity$3] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:2517:29)

This is with HEAD of the njordhov:node branch (@njordhov) 🙂

The following equivalent Clojure code works fine:

#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])

(h/as-hickory (h/parse (:body (http/get "https://www.google.co.uk"))))

What am I doing wrong? Any ideas?

Thanks!

alza-bitz avatar Dec 13 '18 22:12 alza-bitz

The likely cause for the No matching clause error is that in jsdom, attributes have no nodeType property, consistent with DOM4 where element attributes are no longer implementing the Node interface. As a consequence there is no clause for an attribute in as-hickory, causing it to fail when recursively processing the attributes of elements:

(map as-hickory (aget this "attributes")) 

A work-around is to patch the Attribute constant to be consistent with jsdom:

(set! hickory.core/Attribute nil)

A more permanent fix is for as-hickory and as-hiccup to no longer recur over element attributes but instead recode them in place. See issue #36 for details.

PS: Be careful setting js/document in node as it is used by prominent js modules to determine whether or not it is run in a nodejs context (bad idea). Consider instead to bind it dynamically:

(async/go
  (let [response (async/<! (http/get "https://www.google.co.uk"))]
    (with-redefs [js/document  (.-document (.-window (jsdom.JSDOM.)))]
      (h/as-hickory (h/parse (:body response))))))

njordhov avatar Dec 14 '18 03:12 njordhov

Hi @njordhov thanks for the help!

(set! hickory.core/Attribute nil) seems to do the trick, but when I additionally try to use the with-redefs style binding I cannot use the surrounding let to define jsdom (I get ReferenceError: jsdom is not defined), so I have to use a global def:

#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
#?(:cljs (require '[cljs.nodejs :as nodejs]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])

(set! js/XMLHttpRequest (nodejs/require "xhr2"))

#?(:cljs (def jsdom (nodejs/require "jsdom")))

(async/go
  (let [response (async/<! (http/get "https://www.google.co.uk"))]
    (with-redefs [js/Node goog.dom.NodeType
                  js/NodeList goog.dom.NodeList
                  js/document (.-document (.-window (jsdom.JSDOM.)))
                  hickory.core/Attribute nil]
      (h/as-hickory (h/parse (:body response))))))

alza-bitz avatar Dec 18 '18 01:12 alza-bitz

@slipset Any thoughts on if this PR is ok to land? Looks like the main feedback was addressed and then approved at https://github.com/clj-commons/hickory/pull/33#issuecomment-199550029. Encountered same failure as #17

logseq-cldwalker avatar Oct 18 '22 19:10 logseq-cldwalker

@slipset Needed to merge this before we can move on to running tests on Node.js without a browser. If tests fail we can revert.

borkdude avatar Nov 20 '22 11:11 borkdude