tx-logs containing `clojure.lang.PersistentVector$ChunkedSeq` should still be readable
Adds backwards compatibility for tx-logs that inadvertently contain serializations of clojure.lang.PersistentVector$ChunkedSeq instead of plain vectors.
How these serializations might have ended up on the tx-log is unclear to me for the moment, and resolving this alone isn't fully resolving the broader replay issue yet (so a related fix is probably needed also).
The proposed fix is necessary for backwards compatibility since at least the change in https://github.com/xtdb/xtdb/pull/1720
The H2 tx-log where this was observed was written by 20.06-1.8.5-alpha (or possibly even earlier) but I haven't done a full analysis of all possible changes since then, and this is what was shown prior to fixing:
Execution error (IllegalStateException) at xtdb.io/conform-tx-log-entry (io.clj:367).
unexpected value on tx-log: '([:crux.tx/put "0ea5661d363f521f8133b664048b8feb3ad0aeba" "99266546a2219eda420f92d3f0b10d49ec859115"])' of type 'class clojure.lang.PersistentVector$ChunkedSeq'
The subsequent problem to solve is a "missing docs" error, and that looks to be because the ChunkedSeq serialization is losing the edn tag, and therefore fetch-docs is being passed hex strings. These are valid hex strings though, and using xtdb.codec/id-edn-reader to coerce them manually and call fetch-docs works fine.
A fix for this should probably be included in this PR also, and might look something like:
(defmulti hex-string-event->edn-event
(fn [op :as _tx-event]
op))
(defmethod hex-string-event->edn-event :crux.tx/put [_op k v start-valid-time end-valid-time]
[_op (c/id-edn-reader k) (c/id-edn-reader v) start-valid-time end-valid-time])
(defmethod hex-string-event->edn-event :crux.tx/delete [_op k start-valid-time end-valid-time]
[_op (c/id-edn-reader k) start-valid-time end-valid-time])
(defmethod hex-string-event->edn-event :crux.tx/match [_op k v at-valid-time]
[_op (c/id-edn-reader k) (c/id-edn-reader v) at-valid-time])
;; in conform-tx-log-entry ...
(instance? clojure.lang.PersistentVector$ChunkedSeq entry) {:xtdb.tx.event/tx-events (map hex-string-event->edn-event entry)})
However, it's very likely that the timestamps would also need coercing, and we'd likely need some tests.
I am parking this investigation for now - but if someone is reading this and is at all concerned about the possible impact: please get it touch!
Hi @refset, do you already have an idea about when a fix for this issue will be released? Thanks a lot!
Hey @mallt - on reflection I think it may be simplest to perform a one-time migration of the underlying tx-log by simply converting the clojure.lang.PersistentVector$ChunkedSeq instances to vectors, without modifying XT at all. This would be an offline migration requiring a sql export and sql import. Could that be acceptable for your usage?
Hi @refset, yes that would be perfect for me, thank you!
Hi @refset, could you explain how I can perform this one-time migration to convert the PersistentVector$ChunkedSeq instances to vectors? Thank you!
It's something like:
- run some SQL to export the full log, e.g. https://github.com/xtdb/xtdb/blob/00d916139c62b47765ccc3f17458d7e46b230831/modules/jdbc/src/xtdb/jdbc.clj#L167
- map over the entries
2.2) thaw the values, e.g. https://github.com/xtdb/xtdb/blob/00d916139c62b47765ccc3f17458d7e46b230831/modules/jdbc/src/xtdb/jdbc.clj#L72
2.2) coerce the
:tx-eventsclojure.lang.PersistentVector$ChunkedSeqto vectors 2.3) find and replace hex strings with#xtdb/idtagged strings 2.4) re-freeze the entries - run SQL to import the entries, replacing the existing log
Hi @refset, I've been trying to fix the issue with the following code:
(defn get-records-to-fix
[]
(let [res (jdbc/execute! ds ["select * from tx_events"]
{:builder-fn rs/as-unqualified-lower-maps})
mapped (mapv (fn [x] (assoc x :thaw (nippy/thaw (:v x))))
res)
filtered (filterv (fn [x] (= clojure.lang.PersistentVector$ChunkedSeq (type (:thaw x))))
mapped)
vectorized (mapv (fn [x] (assoc x :fixed-thaw (vec (:thaw x))))
filtered)
with-fixed-v (mapv (fn [x] (assoc x :fixed-v (nippy/freeze (:fixed-thaw x))))
vectorized)]
with-fixed-v))
(let [records (get-records-to-fix)]
(mapv (fn [x]
(jdbc/execute! ds ["update tx_events set V = ? where EVENT_OFFSET = ?" (:fixed-v x) (:event_offset x)]))
records))
When testing the updated database with version 20.07-1.10.0-beta however there are still missing documents in the database.
I notice I didn't execute step 2.3) find and replace hex strings with #xtdb/id tagged strings. Could you explain how this should be done (I'm using an old version of the db so I suppose I need to use #crux/id)?
Thanks!
Hey @mallt sorry to keep you waiting! I am not certain enough where in the data structure such hex strings may appear, but I think you could safely do a simple walk to find all 40 char strings and assume that they are IDs. Then you can convert (replace) them using https://github.com/xtdb/xtdb/blob/c41f4dee8af8c9673c0f04a742779665ecde4146/core/src/xtdb/codec.clj#L723
Hi @refset , all I'm doing is applying vec to the thaw: (assoc x :fixed-thaw (vec (:thaw x))), so I believe the data should be unaffected. Or do you think the ID could be removed by applying vec?
Thanks!
IIRC the problematic thawed data won't contain the necessary edn tags - the IDs should have the tag #crux/id but appear as plain strings. I don't really understand how this can have happened but that's how it seemed to me (there's a chance I may have been mistaken however). Therefore, using vec to fix the top-level issue is not enough, and you have to also manipulate the contents of individual put / match / delete ops, to swap the string ID values for properly tagged #crux/id values.
Or do you think the ID could be removed by applying vec?
I don't think vec would strip the tag :thinking:
Hi @refset, I've been able to migrate my databases to solve this issue, thank you for all the help!
Hey @mallt I only just read your last reply...that's really great to hear though, and thanks for your patience & perseverance. I hope the rest of your XTDB experience is much smoother :slightly_smiling_face: