ordered icon indicating copy to clipboard operation
ordered copied to clipboard

Issues with ordered/map data literals

Open yuhan0 opened this issue 7 months ago • 6 comments
trafficstars

The #ordered/map data literal constructs an invalid data structure when read in the REPL, but not when used in a def form.

Minimal repro at the REPL:

$ clj -Sdeps '{:deps {org.flatland/ordered {:mvn/version "1.15.12"}}}'
Clojure 1.12.0
user=> (require '[flatland.ordered.map :as o])
nil

user=> (get (o/ordered-map [[:a 1]]) :a)
1

user=> (get #ordered/map [[:a 1]] :a)
Execution error (ClassCastException) at flatland.ordered.map.OrderedMap/valAt (map.clj:51).
class clojure.lang.PersistentVector cannot be cast to class clojure.lang.MapEntry (clojure.lang.PersistentVector and clojure.lang.MapEntry are in unnamed module of loader 'app')

Extended repro as source file:

(ns repro (:require [flatland.ordered.map :as o])

(def m1 #ordered/map [[:a 1]])

(let [m #ordered/map [[:a 1]]]
  (def m2 m))

(#(def m3 #ordered/map [[:a 1]]))

(keys m1) ;; => (:a)
(keys m2) ;; boom
(keys m3) ;; boom

(map class m1) ;; => (clojure.lang.MapEntry)
(map class m2) ;; => (clojure.lang.PersistentVector)
(map class m3) ;; => (clojure.lang.PersistentVector)

Something about Clojure's reader or this library's internals appears to be preventing the vectors from being cast into map entries or vice versa, causing the constructed object to be invalid and blow up when one tries to manipulate it.

The relevant data reader is https://github.com/clj-commons/ordered/blob/b5315081f24858e3169923f0de6b9f08bc0dda5a/src/flatland/ordered/map.clj#L210

Note: it's possible to delay construction of the object to runtime as in the ordered-map-reader-cljs, but as hiredman pointed out in the linked Slack discussion, this would break the ability for macros to work with these ordered-map literals.

I attempted to fix it by having the reader explicitly cast to MapEntry, but this also fails:

(defn my-ordered-map-reader
  "does not work!"
  [coll]
  (into (ordered-map)
        (map (fn [[k v]] (clojure.lang.MapEntry/create k v)))
        coll))

(set! *data-readers*
      (assoc *data-readers* 'o/map #'my-ordered-map-reader))

(map class #o/map ([:a 1])) ;; => (clojure.lang.PersistentVector)
(keys #o/map ([:a 1])) ;; boom

More discussion and context at: https://clojurians.slack.com/archives/C03S1KBA2/p1742444212617369

yuhan0 avatar Mar 20 '25 09:03 yuhan0

Note that read/read-string works as expected:

(def m4 (read-string "#ordered/map ([:a 1])"))

(keys m4) ;; => (:a)

yuhan0 avatar Mar 20 '25 09:03 yuhan0

It appears that this happens to any OrderedMap at read time, even if it was previously constructed in a valid way :

(def valid-data (ordered-map [[:a 1]]))
(keys valid-data) ;; => (:a)
(map class valid-data) ;; => (clojure.lang.MapEntry)

(defn ordered-map-returning-reader [_]
  valid-data)

(set! *data-readers*
      (assoc *data-readers*
             'o/constant
             #'ordered-map-returning-reader))

;; Appears to return the same thing
(quote #o/constant _) ;; => #ordered/map ([:a 1])

;; But somehow the entries got transformed into vectors
(map class #o/constant _) ;; => (clojure.lang.PersistentVector)

yuhan0 avatar Mar 20 '25 09:03 yuhan0

@yuhan0 For some reason, via the reader macro path, what should have been a MapEntry is a vector in the backing map. If we change this:

  (valAt [this k not-found]
    (if-let [^MapEntry e (.get ^Map backing-map k)]
      (second e)
      #_(.val e)
      not-found))

in the OrderedMap type, then both the reader macro and the normal function constructor work properly. But it would be good to find out why the reader macro path causes a vector instead of a MapEntry to be present in the backing map.

borkdude avatar Mar 20 '25 10:03 borkdude

A repro of the problem:

user=> (defn foo [v] (clojure.lang.MapEntry. (first v) (second v)))
#'user/foo
user=> (foo [1 2])
[1 2]
user=> (type (foo [1 2]))
clojure.lang.MapEntry
user=> (set! *data-readers* (assoc *data-readers* 'map/entry foo))
...
user=> #map/entry [1 2]
[1 2]
user=> (type #map/entry [1 2])
clojure.lang.PersistentVector

borkdude avatar Mar 20 '25 11:03 borkdude

Update: the root cause was figured out in the above Slack discussion thanks to @p-himik. Here are the relevant quotes (slightly edited for context):

  • To execute something like (type #ordered/map ([1 2])) you have to compile the code corresponding to that form
  • In order to compile it, you have to have it in a "compilable" form - something in a static runtime-independent form, so it cannot rely on runtime values
  • For that, clojure.lang.Compiler.ObjExpr#emitConstants is called, which ends up calling clojure.core/pr-on
  • The result of that printing is then properly compiled and executed

Since printing some runtime value can result in a value that's different from the original, we see such confusing behaviors. My experience with #inst was exactly the same one. It doesn't happen with things like (type (ordered-map [1 2])) because everything here is already runtime-independent. In other words - the problem is that tagged literals produce values instead of being expanded into code. And those values have to be printed and re-read during compilation.

Instances of a custom type created with deftype go through the if(value instanceof IType) branch, and it runs instead of the printing. So the issue is not that an ordered map gets printed in a wrong manner, but that map entries get treated as vectors, as borkdude has indicated. Printing is the default branch in clojure.lang.Compiler.ObjExpr#emitValue that gets triggered only when nothing else matches.

Specifically since OrderedMap is a deftype, it gets handled by the following branch, which recursively emits the contents of its fields and uses them to reconstruct the 'static runtime-independent form' of the ordered map: https://github.com/clojure/clojure/blob/fb22fd778a272b034684a4ee94509552b46ee8a9/src/jvm/clojure/lang/Compiler.java#L5314

But the internal fields order and backing-map both contain MapEntry values which implement IPersistentVector, and they get handled by this branch https://github.com/clojure/clojure/blob/fb22fd778a272b034684a4ee94509552b46ee8a9/src/jvm/clojure/lang/Compiler.java#L5356 which emits a literal vector in place of the original value's MapEntry - This violates all sorts of assumptions in the implemented methods which call .key and .val on these values.

yuhan0 avatar Mar 21 '25 12:03 yuhan0

As a library, there doesn't seem to be much one can do to bypass the above mechanisms and fix the situation 😕 Besides workarounds like #78 which weaken the internal invariants, conceding the fact that the compiler will construct instances of the datatype in a manner that is outside our control, when tagged literals are involved

This is related to a broader issue surrounding Clojure's treatment of non-built-in types - see this Ask Clojure post https://ask.clojure.org/index.php/14474/custom-literal-collections-reach-parity-clojures-collections

yuhan0 avatar Mar 21 '25 12:03 yuhan0