tech.ml.dataset
tech.ml.dataset copied to clipboard
A Clojure high performance data processing system
```console Unhandled java.lang.Exception Column appId has value whose length (109583) is greater than max-chars-per-column (65536). ``` The fix is to use charred for writing csv files.
I just changed one line, but the diff looks huge, Some whitespace mess.
Per [bug in categorical mapping](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/bug.20in.20categorical.20mapping.20.3F) discussion, the simple scheme of assigning categories based on the size of the map is insufficient in the face of user input: before ``` user>...
``` (tech.v3.dataset.column/new-column :a []) ;; => #tech.v3.dataset.column[0] ;; :a ;; [] ``` Is there a way to force a datatype during creation of column?
Following the Zulip discussion about [pr-str printing of columns](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/pr-str.20printing.20of.20columns), we are suggesting the following format for printing columns (this is basically @joinr's suggestion at that thread, with some minor changes)....
We want to define a data structure specification for a query that can become canonical within tech.ml.dataset. This will help make query-related functions smarter because it will be introspectable. A...
This may never get merged, just working on doing some simple analysis on load of data and then allowing find-column to do efficient range queries if the data is ordered.
In order to make categorical mapping related code less brittle, I think we should check and fail in more situations, one is this one: ```clojure (require '[tech.v3.dataset.categorical :as ds-cat] '[tech.v3.dataset.modelling...
Hi, my work requires me to implement writing nested types in arrow format. Currently I use tech.ml.dataset to convert Clojure columnar data into the arrows format for processing in C++....
see https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/tribuo.20prediction.20datatype.20does.20not.20match ```clojure (ns scicloj.ml.tribuo (:require [tech.v3.dataset :as ds] [tech.v3.dataset.modelling :as ds-model] [tech.v3.libs.tribuo :as tribuo]) (:import (com.oracle.labs.mlrg.olcut.config DescribeConfigurable) (org.tribuo.classification.sgd.linear LogisticRegressionTrainer))) (def logreg-trainer (LogisticRegressionTrainer.)) (def dummy-ds (-> (ds/->dataset {:x [1 1]...