clj-tagsoup icon indicating copy to clipboard operation
clj-tagsoup copied to clipboard

Add usage examples

Open ustun opened this issue 10 years ago • 2 comments

Once the html is parsed, how can most efficiently query the parsed document? That is, I would want to be able to drill down as if it were a map:

(get-in x [:html :head :title])

It would be great if you added some recommendations how to do that transformation (for example https://github.com/cjohansen/hiccup-find looks promising).

ustun avatar Oct 16 '14 11:10 ustun

Ditto this. As a clojure noob, this vector thing confuses the hell out of me

collinalexbell avatar Mar 04 '15 16:03 collinalexbell

Thanks for chiming in!

A quick-and-dirty solution could be something along the lines of (untested, might be buggy):

(defn get-in-html [tree [tag & tags]]
  (if tag
    (when tree
      (recur (first (filter #(= (first %) tag) (rest tree))) tags))
    tree))

Note that you'd want to call it as (get-in x [:head :title]), bypassing the :html.

This is very simplistic and only supports seqs of tags. If you want to extract arbitrary subtrees, you may want to take a look at Enlive. (When I have free time, I intend to explore the possibility of integrating clj-tagsoup and Enlive, as I feel that both projects might benefit from this.)

nathell avatar Mar 04 '15 16:03 nathell