tech.ml.dataset icon indicating copy to clipboard operation
tech.ml.dataset copied to clipboard

Is the datatype :instant supported?

Open alza-bitz opened this issue 9 months ago • 1 comments

According to the docs here https://techascent.github.io/tech.ml.dataset/supported-datatypes.html :instant is a supported data type.

However with this example code (adapted from the same page) running with version 7.053 I get No matching clause: :instant:

(ns dataset
  (:require [tech.v3.dataset-api :as ds]))

(def data-maps (for [idx (range 10)]
                       {:a idx
                        :b (str (.plusDays (java.time.LocalDateTime/now) idx))}))

data-maps

;; works
(:b (ds/->dataset data-maps))

;; works
(:b (ds/->dataset data-maps {:parser-fn {:b [:local-date "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS"]}}))

;; works
(:b (ds/->dataset data-maps {:parser-fn {:b [:local-date-time "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS"]}}))

;; not working
(:b (ds/->dataset data-maps {:parser-fn {:b [:instant "yyyy-MM-dd'T'HH:mm:ss.SSSSSSSSS"]}}))
;; Execution error (IllegalArgumentException) at tech.v3.dataset.io.datetime/datetime-formatter-parse-str-fn (datetime.clj:150).
;; No matching clause: :instant

Thanks 🙏

alza-bitz avatar Mar 20 '25 13:03 alza-bitz

For sure instants are supported:

user> (->> (for [idx (range 10)]
             {:a idx
              :b (.plus (java.time.Instant/now) (java.time.Period/ofDays idx))})
           (ds/->>dataset))
_unnamed [10 2]:

| :a |                          :b |
|---:|-----------------------------|
|  0 | 2025-03-20T15:18:22.720167Z |
|  1 | 2025-03-21T15:18:22.720191Z |
|  2 | 2025-03-22T15:18:22.720196Z |
|  3 | 2025-03-23T15:18:22.720202Z |
|  4 | 2025-03-24T15:18:22.720203Z |
|  5 | 2025-03-25T15:18:22.720204Z |
|  6 | 2025-03-26T15:18:22.720206Z |
|  7 | 2025-03-27T15:18:22.720207Z |
|  8 | 2025-03-28T15:18:22.720208Z |
|  9 | 2025-03-29T15:18:22.720210Z |
user> (map meta (ds/columns *1))
({:name :a, :datatype :int64, :n-elems 10}
 {:name :b, :datatype :packed-instant, :n-elems 10})

I think with what you're trying to do you're going to run into time zone confusion.

If the string format you're dealing with is the one you showed in this example, I would start by reading them in as strings, and then use row-map to convert them into something computable with very explicit time zone handling. I wouldn't reach for :parser-fn in this case.

hth (:

harold avatar Mar 20 '25 15:03 harold