xtdb icon indicating copy to clipboard operation
xtdb copied to clipboard

tx-logs containing `clojure.lang.PersistentVector$ChunkedSeq` should still be readable

Open refset opened this issue 3 years ago • 1 comments

Adds backwards compatibility for tx-logs that inadvertently contain serializations of clojure.lang.PersistentVector$ChunkedSeq instead of plain vectors.

How these serializations might have ended up on the tx-log is unclear to me for the moment, and resolving this alone isn't fully resolving the broader replay issue yet (so a related fix is probably needed also).

The proposed fix is necessary for backwards compatibility since at least the change in https://github.com/xtdb/xtdb/pull/1720

The H2 tx-log where this was observed was written by 20.06-1.8.5-alpha (or possibly even earlier) but I haven't done a full analysis of all possible changes since then, and this is what was shown prior to fixing:

Execution error (IllegalStateException) at xtdb.io/conform-tx-log-entry (io.clj:367).
unexpected value on tx-log: '([:crux.tx/put "0ea5661d363f521f8133b664048b8feb3ad0aeba" "99266546a2219eda420f92d3f0b10d49ec859115"])' of type 'class clojure.lang.PersistentVector$ChunkedSeq'

refset avatar Aug 25 '22 12:08 refset

The subsequent problem to solve is a "missing docs" error, and that looks to be because the ChunkedSeq serialization is losing the edn tag, and therefore fetch-docs is being passed hex strings. These are valid hex strings though, and using xtdb.codec/id-edn-reader to coerce them manually and call fetch-docs works fine.

A fix for this should probably be included in this PR also, and might look something like:

(defmulti hex-string-event->edn-event
  (fn [op :as _tx-event]
    op))

(defmethod hex-string-event->edn-event :crux.tx/put [_op k v start-valid-time end-valid-time]
  [_op (c/id-edn-reader k) (c/id-edn-reader v) start-valid-time end-valid-time])

(defmethod hex-string-event->edn-event :crux.tx/delete [_op k start-valid-time end-valid-time]
  [_op (c/id-edn-reader k) start-valid-time end-valid-time])

(defmethod hex-string-event->edn-event :crux.tx/match [_op k v at-valid-time]
  [_op (c/id-edn-reader k) (c/id-edn-reader v) at-valid-time])

;; in conform-tx-log-entry ...
(instance? clojure.lang.PersistentVector$ChunkedSeq entry) {:xtdb.tx.event/tx-events (map hex-string-event->edn-event entry)})

However, it's very likely that the timestamps would also need coercing, and we'd likely need some tests.

I am parking this investigation for now - but if someone is reading this and is at all concerned about the possible impact: please get it touch!

refset avatar Aug 25 '22 12:08 refset

Hi @refset, do you already have an idea about when a fix for this issue will be released? Thanks a lot!

mallt avatar Oct 10 '22 08:10 mallt

Hey @mallt - on reflection I think it may be simplest to perform a one-time migration of the underlying tx-log by simply converting the clojure.lang.PersistentVector$ChunkedSeq instances to vectors, without modifying XT at all. This would be an offline migration requiring a sql export and sql import. Could that be acceptable for your usage?

refset avatar Oct 13 '22 13:10 refset

Hi @refset, yes that would be perfect for me, thank you!

mallt avatar Oct 13 '22 13:10 mallt

Hi @refset, could you explain how I can perform this one-time migration to convert the PersistentVector$ChunkedSeq instances to vectors? Thank you!

mallt avatar Oct 14 '22 05:10 mallt

It's something like:

  1. run some SQL to export the full log, e.g. https://github.com/xtdb/xtdb/blob/00d916139c62b47765ccc3f17458d7e46b230831/modules/jdbc/src/xtdb/jdbc.clj#L167
  2. map over the entries 2.2) thaw the values, e.g. https://github.com/xtdb/xtdb/blob/00d916139c62b47765ccc3f17458d7e46b230831/modules/jdbc/src/xtdb/jdbc.clj#L72 2.2) coerce the :tx-events clojure.lang.PersistentVector$ChunkedSeq to vectors 2.3) find and replace hex strings with #xtdb/id tagged strings 2.4) re-freeze the entries
  3. run SQL to import the entries, replacing the existing log

refset avatar Oct 14 '22 21:10 refset

Hi @refset, I've been trying to fix the issue with the following code:

(defn get-records-to-fix
  []
  (let [res (jdbc/execute! ds ["select * from tx_events"]
                           {:builder-fn rs/as-unqualified-lower-maps})
        mapped (mapv (fn [x] (assoc x :thaw (nippy/thaw (:v x))))
                     res)
        filtered (filterv (fn [x] (= clojure.lang.PersistentVector$ChunkedSeq (type (:thaw x))))
                          mapped)
        vectorized (mapv (fn [x] (assoc x :fixed-thaw (vec (:thaw x))))
                         filtered)
        with-fixed-v (mapv (fn [x] (assoc x :fixed-v (nippy/freeze (:fixed-thaw x))))
                           vectorized)]
    with-fixed-v))

(let [records (get-records-to-fix)]
  (mapv (fn [x]
          (jdbc/execute! ds ["update tx_events set V = ? where EVENT_OFFSET = ?" (:fixed-v x) (:event_offset x)]))
        records))

When testing the updated database with version 20.07-1.10.0-beta however there are still missing documents in the database.

I notice I didn't execute step 2.3) find and replace hex strings with #xtdb/id tagged strings. Could you explain how this should be done (I'm using an old version of the db so I suppose I need to use #crux/id)?

Thanks!

mallt avatar Nov 03 '22 09:11 mallt

Hey @mallt sorry to keep you waiting! I am not certain enough where in the data structure such hex strings may appear, but I think you could safely do a simple walk to find all 40 char strings and assume that they are IDs. Then you can convert (replace) them using https://github.com/xtdb/xtdb/blob/c41f4dee8af8c9673c0f04a742779665ecde4146/core/src/xtdb/codec.clj#L723

refset avatar Nov 08 '22 22:11 refset

Hi @refset , all I'm doing is applying vec to the thaw: (assoc x :fixed-thaw (vec (:thaw x))), so I believe the data should be unaffected. Or do you think the ID could be removed by applying vec?

Thanks!

mallt avatar Nov 09 '22 08:11 mallt

IIRC the problematic thawed data won't contain the necessary edn tags - the IDs should have the tag #crux/id but appear as plain strings. I don't really understand how this can have happened but that's how it seemed to me (there's a chance I may have been mistaken however). Therefore, using vec to fix the top-level issue is not enough, and you have to also manipulate the contents of individual put / match / delete ops, to swap the string ID values for properly tagged #crux/id values.

Or do you think the ID could be removed by applying vec?

I don't think vec would strip the tag :thinking:

refset avatar Nov 10 '22 12:11 refset

Hi @refset, I've been able to migrate my databases to solve this issue, thank you for all the help!

mallt avatar Dec 21 '22 13:12 mallt

Hey @mallt I only just read your last reply...that's really great to hear though, and thanks for your patience & perseverance. I hope the rest of your XTDB experience is much smoother :slightly_smiling_face:

refset avatar Jan 24 '23 20:01 refset