tech.ml.dataset icon indicating copy to clipboard operation
tech.ml.dataset copied to clipboard

A Clojure high performance data processing system

Results 33 tech.ml.dataset issues
Sort by recently updated
recently updated
newest added

```console Unhandled java.lang.Exception Column appId has value whose length (109583) is greater than max-chars-per-column (65536). ``` The fix is to use charred for writing csv files.

I just changed one line, but the diff looks huge, Some whitespace mess.

Per [bug in categorical mapping](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/bug.20in.20categorical.20mapping.20.3F) discussion, the simple scheme of assigning categories based on the size of the map is insufficient in the face of user input: before ``` user>...

``` (tech.v3.dataset.column/new-column :a []) ;; => #tech.v3.dataset.column[0] ;; :a ;; [] ``` Is there a way to force a datatype during creation of column?

Following the Zulip discussion about [pr-str printing of columns](https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/pr-str.20printing.20of.20columns), we are suggesting the following format for printing columns (this is basically @joinr's suggestion at that thread, with some minor changes)....

We want to define a data structure specification for a query that can become canonical within tech.ml.dataset. This will help make query-related functions smarter because it will be introspectable. A...

This may never get merged, just working on doing some simple analysis on load of data and then allowing find-column to do efficient range queries if the data is ordered.

In order to make categorical mapping related code less brittle, I think we should check and fail in more situations, one is this one: ```clojure (require '[tech.v3.dataset.categorical :as ds-cat] '[tech.v3.dataset.modelling...

Hi, my work requires me to implement writing nested types in arrow format. Currently I use tech.ml.dataset to convert Clojure columnar data into the arrows format for processing in C++....

see https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/tribuo.20prediction.20datatype.20does.20not.20match ```clojure (ns scicloj.ml.tribuo (:require [tech.v3.dataset :as ds] [tech.v3.dataset.modelling :as ds-model] [tech.v3.libs.tribuo :as tribuo]) (:import (com.oracle.labs.mlrg.olcut.config DescribeConfigurable) (org.tribuo.classification.sgd.linear LogisticRegressionTrainer))) (def logreg-trainer (LogisticRegressionTrainer.)) (def dummy-ds (-> (ds/->dataset {:x [1 1]...