de-dupe icon indicating copy to clipboard operation
de-dupe copied to clipboard

Provide helpful examples showing compression

Open jiangts opened this issue 10 years ago • 0 comments

I tried a couple of test inputs with a lot of duplication to test the src/core.cljs file as per your README.

Invariably, using de-dupe made the value of (count (prn-str data)) larger than the original!

Can you post an example of de-dupe helping compress data?

Here's what I tried

(def data1 {:contacts
            [{:first "Ben" :last "Bitdiddle" :email "[email protected]"}
             {:first "Alyssa" :middle-initial "P" :last "Hacker" :email "[email protected]"}
             {:first "Eva" :middle "Lu" :last "Ator" :email "[email protected]"}
             {:first "Louis" :last "Reasoner" :email "[email protected]"}
             {:first "Cy" :middle-initial "D" :last "Effect" :email "[email protected]"}
             {:first "Lem" :middle-initial "E" :last "Tweakit" :email "[email protected]"}]})
(def data2 {:contacts
            [{:first "Ben" :last "Bitdiddle" :email "[email protected]"}
             {:first "Alyssa" :middle-initial "P" :last "Hacker" :email "[email protected]"}
             {:first "Louis" :last "Reasoner" :email "[email protected]"}
             {:first "Cy" :middle-initial "D" :last "Effect" :email "[email protected]"}
             {:first "Lem" :middle-initial "E" :last "Tweakit" :email "[email protected]"}]})
(def data3 {:contacts []})

(def some-data [data1 data2 data3])

(def compressed (de-dupe some-data))
;  if you now compare
(println "compressed:" (count (prn-str compressed)))
(println "original:" (count (prn-str some-data)))
;  you will see the degree of comparision
(println "compressed:" compressed)
(println "original:" some-data)

;  to recover your original data
(def some-data2 (expand compressed))
(println "recovered" some-data2)

And I got

compressed: 1132
original: 817
...

jiangts avatar Aug 04 '15 21:08 jiangts