rewrite-clj icon indicating copy to clipboard operation
rewrite-clj copied to clipboard

Location is not always populated in parse exceptions

Open lread opened this issue 1 year ago • 3 comments

Version 1.1.47

Platform All

Symptom The :row and :col map entries are not always populated in parsing exceptions data.

Exhibit 1 Case where :row and :col are absent for exception data:

(require '[rewrite-clj.parser :as p])

(try
  (p/parse-string-all "goodsym badsym/ oksym")
  (catch Throwable e
    {:msg (ex-message e)
     :data (ex-data e)}))
;; => {:msg "Invalid symbol: badsym/.",
;;     :data {:type :reader-exception, :ex-kind :reader-error}}

Exhibit 2 Case where position is available in exception data, but notice :line instead of :row.

(try
  (p/parse-string-all ":goodkw :badkw: :okkw")
  (catch Throwable e
    {:msg (ex-message e)
     :data (ex-data e)}))
;; => {:msg "[line 1, col 16] Invalid keyword: badkw:.",
;;     :data
;;     {:type :reader-exception, :ex-kind :reader-error, :file nil, :line 1, :col 16}}

Expected behavior Here's a case that works as expected:

(try                   
  (p/parse-string-all "#:{:a 1}")
  (catch Throwable e
    {:msg (ex-message e)
     :data (ex-data e)}))
;; => {:msg "namespaced map expects a namespace [at line 1, column 3]",
;;     :data {:msg "namespaced map expects a namespace", :row 1, :col 3}}

Diagnosis For default token handling, we delegate to clojure.tools.reader's read-string in these cases, row/col info is not available. In the case of the keyword, we internalized and used clojure.tools.reader read-keyword and it throws using clojure.tools.reader throw mechanisms.

Action I'll take a peek.

lread avatar Jun 30 '24 23:06 lread

It's probably worth studying, absent of rewrite-clj, what positional data tools.reader v1.4.2 throws.

(require '[clojure.tools.reader.edn :as edn]
         '[clojure.tools.reader.reader-types :as rt])


(defn read-all [s]
  (let [rdr (-> s
                rt/string-push-back-reader
                rt/indexing-push-back-reader)]
    (loop [acc []]
      (let [n (edn/read {:eof nil} rdr)]
        (if n
          (recur (conj acc n))
          acc)))))

1 col after the problematic token:

          ;123456789
(read-all "boo bah/ bang")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 9] Invalid symbol: bah/.

But when at eof, col at the end of the problematic token:

          ;1234
(read-all "bah/")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Invalid symbol: bah/.

Unclosed vector at eof, col at end of last item:

          ;1234
(read-all "[boo")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Unexpected EOF while reading item 1 of vector, starting at line 1 and column 1.

...but not consistently, here we are 1 col after eof:

          ;1234567
(read-all "[boo  ")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 7] Unexpected EOF while reading item 1 of vector, starting at line 1 and column 1.

1 char after invalid keyword:


          ;12345678901
(read-all ":boo :bah: :bang")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 11] Invalid keyword: :bah:.

1 char after eof for unterminated string at eof:

          ;1 234
(read-all "\"ab")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Unexpected EOF reading string starting ""ab.

At end col of invalid number:

          ;123
(read-all "42Z")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 3] Invalid number: 42Z.

          ;1234
(read-all "42Z2")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 4] Invalid number: 42Z2.

Namespaced map errors might try to be a little more targeted, but still seems off by 1 col:

          ;1234567
(read-all "#:foo[fah]")
;; => Execution error (ExceptionInfo) at clojure.tools.reader.impl.errors/throw-ex (errors.clj:34).
;;    [line 1, col 7] Namespaced map with namespace foo does not specify a map.

Note that the above does not imply what types of exceptions rewrite-clj does/will throw, it is just a glance at what tools.reader does today.

So tools.reader points you at the end of the problematic token/code and sometimes 1 col after the problematic token/code and sometimes 1 col after eof. Which is probably "good enough" for it.

Since rewrite-clj is carefully adding positions at parse time, I think I'll explore the possibility of doing better. I'm thinking it might be better to point at the start of the problematic token/code. But I'll also explore including both start and end positions. I'll keep in mind that this would technically be a breaking change for folks who rely on locations in exception data.

One thing to note is that I am talking about parsing only here. Tools.reader is also employed when sexpr is called on a string node. I don't have plans to provide positional data on throws for sexpr.

lread avatar Jul 04 '24 18:07 lread

Note that work like this has also been done in the rewrite-clj + tools reader fork in clj-kondo

borkdude avatar Jul 04 '24 18:07 borkdude

Oh handy to know, thanks!

lread avatar Jul 04 '24 19:07 lread

Although the location info would be great to have, for the cases I have encountered like "Invalid symbol: ..." and "Invalid number: ...", the actual invalid thing gets reported and that is usually helpful enough to find where the issue is (at least when examination takes place manually).


Slightly related to:

Case where position is available in exception data, but notice :line instead of :row.

When looking over many (100 or more) error messages, I'm finding that I'd like the location information to be in a consistent format and either always at the beginning or end of the message. It may be that what I'd really like is to be able to sort by error message (or process results programmatically) so the previous sentence may be a manifestation of an XY problem 😅

May be such a thing is not much benefit to most users, too much work (for too little return), and/or possibly bad (because some programs may already depend on particular error message formats).

I tried to find an existing issue that touches on this point, but this was the closest I found.

sogaiu avatar Jun 22 '25 12:06 sogaiu