data.fressian
data.fressian copied to clipboard
Write handlers might want to warn when writing structs with variable component counts
This is, I think, not a bug in Fressian. It might be considered a documentation bug? Or maybe a lack of implementation safeguards? Regardless, I read the documentation and fucked this up badly for several years, so I'm writing this up as a resource for others.
I've been hitting EOF errors when trying to deserialize Fressian records written by Jepsen. It looked like if a custom handler is used to write two objects, the second object somehow gets the component-count of the first. For instance:
(require '[clojure.data.fressian :as fress])
(import (org.fressian.handlers WriteHandler ReadHandler))
(def custom-write-handlers
(-> {clojure.lang.PersistentHashSet
{"persistent-hash-set" (reify WriteHandler
(write [_ w set]
(prn :write-tag "persistent-hash-set"
:component-count (count set))
(.writeTag w "persistent-hash-set"
(count set))
(doseq [e set]
(prn :write-element e)
(.writeObject w e))))}}
(merge fress/clojure-write-handlers)
fress/associative-lookup
fress/inheritance-lookup))
(def custom-read-handlers
(-> {"persistent-hash-set" (reify ReadHandler
(read [_ rdr tag component-count]
(prn :read-tag tag :component-count component-count)
(loop [i component-count
s (transient #{})]
(if (pos? i)
(recur (dec i)
(let [o (.readObject rdr)]
(prn :read-object o)
(conj! s o)))
(let [s (persistent! s)]
(prn :completely-read s)
s)))))}
(merge fress/clojure-read-handlers)
fress/associative-lookup))
(defn fr
"Fressian roundtrip"
[x]
(let [b (fress/write x :handlers custom-write-handlers)
;_ (hexdump/print-dump (.array b))
x' (fress/read b :handlers custom-read-handlers)]
x'))
(fr [#{5 6} #{:foo}])
This writes the first set, with component-count 2, and the second set, with component-count 1...
:write-tag "persistent-hash-set" :component-count 2
:write-element 6
:write-element 5
:write-tag "persistent-hash-set" :component-count 1
:write-element :foo
... and we read the first set correctly:
:read-tag "persistent-hash-set" :component-count 2
:read-object 6
:read-object 5
:completely-read #{6 5}
But somehow the second set gets component-count 2
:read-tag "persistent-hash-set" :component-count 2
:read-object :foo
Execution error (EOFException) at org.fressian.impl.RawInput/readRawByte (RawInput.java:40).
As you can imagine, this can sometimes work. It will, notably, pass round-trip tests, as long as those tests aren't testing collections with multiple objects of the same tag with different lengths. This can also lead to all kinds of silent data corruption issues--objects being read back in to higher nested data structures than where they originally belonged, coerced to different types, etc etc. It's a real mess.
The byte representation of this data structure [#{5 6} #{:foo}]
is:
00000000 e6 ef e3 13 70 65 72 73 69 73 74 65 6e 74 2d 68 |....persistent-h|
00000010 61 73 68 2d 73 65 74 02 06 05 a0 ca f7 cd dd 66 |ash-set........f|
00000020 6f 6f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |oo..............|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000040
So we have:
- e6: packed list of length (2?)
- ef: 1st element of list: a structure
- e3: structure tag: unpacked string
- 13: the length of "persistent-hash-set" "persistent-hash-set"
- 02: component-count of first set
- 06: value in first set
- 05: value in first set
- a0: STRUCT_CACHED_PACKED #1 (ref to persistent-hash-set struct)
- ca: namespaced keyword
- f7: nil
- cd: PUT_PRIORITY_CACHE
- dd: Packed string of length 3
- "foo"
- padding zeroes
The problem here is that I misunderstood (what I think is) a fundamental invariant assumed by Fressian: structs with the same tag always have the same component-count. Unpacked structs are, I think, fine, since they encode their component counts explicitly--which means tests for complex datatypes might (nervous laughter) happen to pass as well, until a custom handler of this type happens to fall in the first 16 structs used in a Fressian record. Packed structs, however, re-use the component count from the original struct's declaration, which will cause silent, or not-so-silent, data corruption.
You might want to consider documenting this on the https://github.com/clojure/data.fressian/wiki/Creating-custom-handlers page, or offering guardrails to users--perhaps by throwing exceptions when struct handlers write the same struct tag twice with different component counts.