data.fressian icon indicating copy to clipboard operation
data.fressian copied to clipboard

Write handlers might want to warn when writing structs with variable component counts

Open aphyr opened this issue 5 years ago • 0 comments

This is, I think, not a bug in Fressian. It might be considered a documentation bug? Or maybe a lack of implementation safeguards? Regardless, I read the documentation and fucked this up badly for several years, so I'm writing this up as a resource for others.

I've been hitting EOF errors when trying to deserialize Fressian records written by Jepsen. It looked like if a custom handler is used to write two objects, the second object somehow gets the component-count of the first. For instance:

(require '[clojure.data.fressian :as fress])
(import (org.fressian.handlers WriteHandler ReadHandler))

(def custom-write-handlers
  (-> {clojure.lang.PersistentHashSet
       {"persistent-hash-set" (reify WriteHandler
                                (write [_ w set]
                                  (prn :write-tag "persistent-hash-set"
                                       :component-count (count set))
                                  (.writeTag w "persistent-hash-set"
                                             (count set))
                                  (doseq [e set]
                                    (prn :write-element e)
                                    (.writeObject w e))))}}
      (merge fress/clojure-write-handlers)
      fress/associative-lookup
      fress/inheritance-lookup))

(def custom-read-handlers
  (-> {"persistent-hash-set" (reify ReadHandler
                               (read [_ rdr tag component-count]
                                 (prn :read-tag tag :component-count component-count)
                                 (loop [i component-count
                                        s (transient #{})]
                                   (if (pos? i)
                                     (recur (dec i)
                                            (let [o (.readObject rdr)]
                                              (prn :read-object o)
                                              (conj! s o)))
                                     (let [s (persistent! s)]
                                       (prn :completely-read s)
                                       s)))))}
      (merge fress/clojure-read-handlers)
      fress/associative-lookup))

(defn fr
  "Fressian roundtrip"
  [x]
  (let [b (fress/write x :handlers custom-write-handlers)
        ;_  (hexdump/print-dump (.array b))
        x' (fress/read b :handlers custom-read-handlers)]
    x'))

(fr [#{5 6} #{:foo}])

This writes the first set, with component-count 2, and the second set, with component-count 1...

:write-tag "persistent-hash-set" :component-count 2
:write-element 6
:write-element 5
:write-tag "persistent-hash-set" :component-count 1
:write-element :foo

... and we read the first set correctly:

:read-tag "persistent-hash-set" :component-count 2
:read-object 6
:read-object 5
:completely-read #{6 5}

But somehow the second set gets component-count 2

:read-tag "persistent-hash-set" :component-count 2
:read-object :foo
Execution error (EOFException) at org.fressian.impl.RawInput/readRawByte (RawInput.java:40).

As you can imagine, this can sometimes work. It will, notably, pass round-trip tests, as long as those tests aren't testing collections with multiple objects of the same tag with different lengths. This can also lead to all kinds of silent data corruption issues--objects being read back in to higher nested data structures than where they originally belonged, coerced to different types, etc etc. It's a real mess.

The byte representation of this data structure [#{5 6} #{:foo}] is:

00000000  e6 ef e3 13 70 65 72 73  69 73 74 65 6e 74 2d 68  |....persistent-h|
00000010  61 73 68 2d 73 65 74 02  06 05 a0 ca f7 cd dd 66  |ash-set........f|
00000020  6f 6f 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |oo..............|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040

So we have:

  • e6: packed list of length (2?)
  • ef: 1st element of list: a structure
  • e3: structure tag: unpacked string
  • 13: the length of "persistent-hash-set" "persistent-hash-set"
  • 02: component-count of first set
  • 06: value in first set
  • 05: value in first set
  • a0: STRUCT_CACHED_PACKED #1 (ref to persistent-hash-set struct)
  • ca: namespaced keyword
  • f7: nil
  • cd: PUT_PRIORITY_CACHE
  • dd: Packed string of length 3
  • "foo"
  • padding zeroes

The problem here is that I misunderstood (what I think is) a fundamental invariant assumed by Fressian: structs with the same tag always have the same component-count. Unpacked structs are, I think, fine, since they encode their component counts explicitly--which means tests for complex datatypes might (nervous laughter) happen to pass as well, until a custom handler of this type happens to fall in the first 16 structs used in a Fressian record. Packed structs, however, re-use the component count from the original struct's declaration, which will cause silent, or not-so-silent, data corruption.

You might want to consider documenting this on the https://github.com/clojure/data.fressian/wiki/Creating-custom-handlers page, or offering guardrails to users--perhaps by throwing exceptions when struct handlers write the same struct tag twice with different component counts.

aphyr avatar May 16 '19 19:05 aphyr