tech.ml.dataset
tech.ml.dataset copied to clipboard
A Clojure high performance data processing system
https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/tensor-.3Edataset.20not.20working.20for.202-d.20arrays ``` (tech.v3.dataset.tensor/tensor->dataset (to-array-2d [[1 1] [1 2] [2 2] [2 3]])) ``` fails., while it works for other 2D structures (seq-of-seq, tensor). Any reason for this ? similar here:...
Partition, I guess, would turn a dataset into a sequence of datasets - with the similar semantics and arguments to `clojure.core/partition` `partition-by` is a little less clear to me, perhaps...
Is this expected? ```clojure user> (ds/print-all (ds/->dataset {:a (range 100)})) _unnamed [100 1]: | :a | |---:| | 0 | | 1 | | 2 | | 3 | |...
Perhaps something like `./scripts/run-minimal-tests` that ran a nice subset of tests that doesn't rely on too many native libs or big datasets as a sanity check for people making minimally...
Users should not have to provide a seq or seq-like things to write sequences of datasets to either arrow or parquet. A transduce-compatible rf function is more ideal for these...
Just had a user who wanted to build a local copy get a bit in the weeds - I think this could be expanded upon a bit.
tr, td, sym, etc.
Repl'ing some datasets and having a lot of fun, but came across a case that didn't print well related to `ds/rows`: ```clj > (def m {:aaaaaaaaaaa 1 :bbbbbbbbbbb 2 :ccccccccccc...
```clojure (spit "test.json" "[ {\"test\": 1, \"time-period\": \"2024-06-20\"}, {\"test\": 2, \"time-period\": \"2024-06-21\"}, {\"test\": 3, \"time-period\": \"2024-06-22\"}]") (tech.v3.dataset/->dataset "test.json" {:key-fn keyword :parser-fn {:time-period :local-date}}) ``` The exception: ``` 1. Unhandled clojure.lang.ArityException...
As we were working on this issue https://clojure.atlassian.net/browse/CLJ-2698, we were looking in the world for cases of defprotocol methods with ^double and ^long return type hints (which are not valid...