hickory
hickory copied to clipboard
Compatible with nodejs
Issue #17 reports an incompatibility with nodejs where loading fails due to missing DOM bindings.
I have resolved this issue by replacingjs/Node
with goog.dom/NodeType
and made js/NodeList
optional, which eliminates the dependency on browser dom, allowing hickory to load also on nodejs.
Interesting!
Is it only about loading or can you parse html in node too?
It's primarily to complete the loading, ready for a polyfill that makes hickory able to parse html in nodejs.
Is this something that is useful by itself, if you still can't parse the HTML?
With the commit hickory.core/parse works in nodejs after setting js/DOMParser to one of the available DOM parsers available for node. I currently use xmldom:
(set! js/DOMParser (.-DOMParser (nodejs/require "xmldom")))
For full use of hickory I also have a hairy polyfill to accommodate that hickory extends NodeList instead of using standard dom to iterate over childnodes and attributes. A minor additional change to hickory would eliminate the need for this patch.
But making hickory loadable on nodejs has value in itself, as it doesn't become a showstopper for using other modules that depends on hickory. Case in point is kioo.
Ok, I don't understand the nodelist issue (or most of what you said really), but I think we don't want to stop anyone's show. Can you explain what the further fix that obviates this patch is?
Sorry for not explaining more clearly. The problem for compatibility with node is that hickory.core (cljs) assumes that there is a js/NodeList
that can be made seqable using extend-type-with-seqable
. However, js/NodeList
is not available on NodeJS.
The quick fix is a wrapper that coerces all dom nodelists into a sequence, for example using a function like:
(defn- as-seq [nodelist]
(if (seq? nodelist) nodelist (array-seq nodelist)))
Then use as-seq
to wrap accessors for the "childNodes" and "attributes" properties to ensure they always provide a sequence that can be used with map:
(as-seq (aget this "childNodes"))
(as-seq (aget this "attributes"))
Or perhaps as-seq should just integrate the aget? Your choice.
With these commits, hickory can be used on nodejs to parse markup into hickory or hiccup format, after setting js/DOMParser to one of the available DOM parsers available for node, such as:
(set! js/DOMParser (.-DOMParser (nodejs/require "xmldom")))
Can we not simply use array-seq
everywhere? Not needs to go through extend-type
if we can avoid it.
We can indeed simply use array-seq
everywhere and eliminate extend-type
. I wrote the commit with the intent to maintain compatibility in case any uses had come to depend on hickory extending js/NodeLists
to become seqable, including cases where the NodeList of the dom nodes from parse
already is seqable. Note that using array-seq
on an object that already is a seq
may lead to unexpected results so better test for seq?
before using array-seq
:
(array-seq (array-seq (.-childNodes (parse "<p>abc<br>def</p>"))))
=> nil
I don't think we need to keep this compatibility. I don't remember exactly why I choose this option when porting to ClojureScript but in retrospect it looks unnecessarily complex.
including cases where the NodeList of the dom nodes from parse already is seqable
Can this happen?
My opinion is we should just go for the simple case. Do you mind modifying the PR to achieve that? Appart from this I think it makes sense to merge then.
Can this happen?
Yes, nodes from parse
may already be seqable if some other module extends js/NodeList
to be seqable. So it makes sense to test for seq?
before using array-seq
on nodelists.
I'll modify the PR for the simple case, eliminating the extend-type
on js/NodeList
.
Right that may happen. I do think it is considered bad practice to do this.
Harmless to add a seq?
check anyway.
I'd expect the seq?
check to be harmless and have insignificant impact on performance. An alternative though is to use DOM to iterate over the child nodes instead of mapping over coerced nodelists.
Extending js/NodeList
to be seqable is practiced in the wild, e.g. in LightTable and Domina. So it makes sense to guard by checking for seq?
.
Good catch!
Looks great now, probably ready to be merged. @davidsantiago what do you think?
I'm fine with an API break and issuing a new minor version, if that's the right thing to do. I'd rather just solve it and avoid any associated complexity. If you're happy with the patch, I'd just like someone to give me a little snippet for the readme change notes so I can be sure I am explaining what changes.
There's no real API breakage beyond the js/NodeList
not being extended anymore but that was an implementation detail never intended to be used outside hickory.
Tentative change notes:
- fixed an issue where text nodes were parsed twice under some circumstances
- hickory can now be used with
nodejs
(seeREADME
for configuration details) - !
js/NodeList
are not extended anymore so iterating over those nodes will require usage ofarray-seq
fixed an issue where text nodes were parsed twice under some circumstances
More accurate to say: text nodes were repeated as the issue could cause text nodes to be parsed and repeated multiple times.
are not extended anymore
Consider instead saying no longer extended to be seqable.
I am now looking into node more for Macchiato. What's the status of this PR?
Yep, sorry about that, this one fell off my radar as other things have been placed above it on my stack. I'll try to find some time tonight to get this merged in.
Is there help needed to get this released? nodejs support would be greatly appreciated.
Yes, very much. I don't know javascript or node or clojurescript really, and there are a number of issues and pull requests pending that I simply don't understand. There has been some discussion and debate in some of them that I don't understand. And just in a more general sense I don't understand what node support entails or what's missing or when it would be done. I have been meaning to do some studying and figure out what is going on in them, but I simply don't have the time to get up to speed for various life reasons.
If you can understand what is going on in pending node PRs and issues and figure out what is needed and what is ready to merge in, I'm going to defer to anyone who knows what they are doing on this environment and can help me figure out what to put in, what changes to make, and what to leave out (if anything).
I would love it to get this working. I actually have this code: `(ns nodetest.core (:require [cljs.nodejs :as nodejs] [hickory.core :as h]))
(nodejs/enable-util-print!)
(set! js/DOMParser (.-DOMParser (cljs.nodejs/require "xmldom")))
(defn -main [] (println (h/as-hiccup (h/parse "<a href="foo">foo <a id="so" href="bar">bar<script src="blah.js">alert("hi");"))))
(set! main-cli-fn -main)`
But when I run this node target/server_dev/nodetest.js
I get the following error:
`/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1088
:else (throw (js/Error. (str coll " is not ISeqable"))))))
^
Error: false is not ISeqable
at Object.cljs$core$seq [as seq] (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1088:20)
at cljs$core$empty_QMARK_ (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1884:20)
at hickory$core$format_doctype (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:55:15)
at hickory.core.as_hiccup.object (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:66:37)
at hickory$core$as_hiccup (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/hickory/core.cljs:11:1)
at /home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:4215:16
at cljs.core.map.cljs$core$IFn$_invoke$arity$2 (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:4215:15)
at cljs.core.LazySeq.sval (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:3010:18)
at cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:3052:12)
at Object.cljs$core$seq [as seq] (/home/wanderwichtl/Documents/clojure/nodetest/target/server_dev/cljs/core.cljs:1075:25)
` I am new to Clojure and don't have the best debug workflow at this time, so help is always appreciated.
Likely xmldom requires well-formed XML markup, with each start tag having a matching end tag:
"<a href='foo'>foo</a> <a id='so' href='bar'>bar</a> <script src='blah.js'>alert('hi');</script>"
If it still fails, I suggest you trim the markup down to a single element:
"<a href='foo'>foo</a>"
As you're new to Clojure, have you actually gone through the extra steps of installing the "node" branch of Hickory rather than just using a dependency on the stock version that doesn't yet support node? If so, I award you the coveted lein install badge, please use your new powers with care.
Very nice patch!
I got https://github.com/mpcarolin/markdown-to-hiccup working in nodejs (via lumo 1.8) by using this fork, installing jsdom
via npm, and finally "polyfilling" something like this:
(let [jsdom (cljs.nodejs/require "jsdom")]
(set! (. goog.global -Node) goog.dom.NodeType)
(set! (. goog.global -document) (.-document (.-window (jsdom.JSDOM.)))))
(markdown-to-hiccup.core/md->hiccup "*hello*")
=> [:div {} [:p {:style {:margin 0}} [:em {} hello]]]
Hi folks,
I'm trying to get Hickory to work on NodeJS courtesy of this PR/branch..
However, with the following code (at the repl):
#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])
(set! js/XMLHttpRequest (cljs.nodejs/require "xhr2"))
(set! (. goog.global -Node) goog.dom.NodeType)
(set! (. goog.global -NodeList) goog.dom.NodeList)
(def jsdom (cljs.nodejs/require "jsdom"))
(set! (. goog.global -document) (.-document (.-window (jsdom.JSDOM.))))
(async/go (let [response (async/<! (http/get "https://www.google.co.uk"))]
(h/as-hickory (h/parse (:body response)))))
I get the following error:
Error: No matching clause:
at /mnt/d/Projects/hickory-experiment/.cljs_node_repl/hickory/core.cljs:89:22
at m__4244__auto__ (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/hickory/core.cljs:19:1)
at f (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:4680:16)
at cljs.core.map.cljs$core$IFn$_invoke$arity$2 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.js:17602:3)
at cljs.core.LazySeq.fn [as sval] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3394:18)
at cljs.core.LazySeq.cljs$core$ISeqable$_seq$arity$1 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3384:1)
at Object.coll [as seq] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:1210:25)
at Function.cljs.core.seq_reduce.cljs$core$IFn$_invoke$arity$3 (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:2445:26)
at cljs.core.LazySeq.cljs.core.seq_reduce.cljs$core$IFn$_invoke$arity$3 [as cljs$core$IReduce$_reduce$arity$3] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:3458:28)
at Function.coll [as cljs$core$IFn$_invoke$arity$3] (/mnt/d/Projects/hickory-experiment/.cljs_node_repl/cljs/core.cljs:2517:29)
This is with HEAD
of the njordhov:node
branch (@njordhov) 🙂
The following equivalent Clojure code works fine:
#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])
(h/as-hickory (h/parse (:body (http/get "https://www.google.co.uk"))))
What am I doing wrong? Any ideas?
Thanks!
The likely cause for the No matching clause
error is that in jsdom, attributes have no nodeType property, consistent with DOM4 where element attributes are no longer implementing the Node interface. As a consequence there is no clause for an attribute in as-hickory
, causing it to fail when recursively processing the attributes of elements:
(map as-hickory (aget this "attributes"))
A work-around is to patch the Attribute constant to be consistent with jsdom:
(set! hickory.core/Attribute nil)
A more permanent fix is for as-hickory
and as-hiccup
to no longer recur over element attributes but instead recode them in place. See issue #36 for details.
PS: Be careful setting js/document
in node as it is used by prominent js modules to determine whether or not it is run in a nodejs context (bad idea). Consider instead to bind it dynamically:
(async/go
(let [response (async/<! (http/get "https://www.google.co.uk"))]
(with-redefs [js/document (.-document (.-window (jsdom.JSDOM.)))]
(h/as-hickory (h/parse (:body response))))))
Hi @njordhov thanks for the help!
(set! hickory.core/Attribute nil)
seems to do the trick, but when I additionally try to use the with-redefs
style binding I cannot use the surrounding let
to define jsdom
(I get ReferenceError: jsdom is not defined
), so I have to use a global def
:
#?(:cljs (require '[cljs.core.async :as async]))
(require #?(:clj '[clj-http.client :as http] :cljs '[cljs-http.client :as http]))
#?(:cljs (require '[cljs.nodejs :as nodejs]))
(require '[hickory.core :as h])
(require '[hickory.select :as hs])
(set! js/XMLHttpRequest (nodejs/require "xhr2"))
#?(:cljs (def jsdom (nodejs/require "jsdom")))
(async/go
(let [response (async/<! (http/get "https://www.google.co.uk"))]
(with-redefs [js/Node goog.dom.NodeType
js/NodeList goog.dom.NodeList
js/document (.-document (.-window (jsdom.JSDOM.)))
hickory.core/Attribute nil]
(h/as-hickory (h/parse (:body response))))))
@slipset Any thoughts on if this PR is ok to land? Looks like the main feedback was addressed and then approved at https://github.com/clj-commons/hickory/pull/33#issuecomment-199550029. Encountered same failure as #17
@slipset Needed to merge this before we can move on to running tests on Node.js without a browser. If tests fail we can revert.