tech.ml.dataset
tech.ml.dataset copied to clipboard
A Clojure high performance data processing system
It at least surprised me to see that the shape is flipped when printing the dataset: https://github.com/techascent/tech.ml.dataset/blob/b0896cc6116ad6aa049fb7f1b955e9fe49b07ae8/src/tech/v3/dataset/print.clj#L336 so I report it here.
Either by implementing interface org.tribuo.math.la.Matrix based on a tensor, or creating DenseMarix from a tech.v3.tensor As we have tribuo interop in here anyway, it might make sense
The documentation lists ` [org.dhatim/fastexcel-reader "0.12.8" :exclusions [org.apache.poi/poi-ooxml]]` as a required dependency. Using this dependency results in a reflection warning. The documentation should be updated to require ``` org.dhatim/fastexcel-reader {:mvn/version...
Like this file: https://github.com/haifengl/smile/blob/v2.6.0/shell/src/universal/data/json/books1.json It it technically not valid JSON. I think it's usually called "jsonl"
From release notes there should be no breaking changes: https://github.com/oracle/tribuo
From Zulip - https://clojurians.zulipchat.com/#narrow/stream/236259-tech.2Eml.2Edataset.2Edev/topic/rolling.20linear.20regression/near/430506985 Have a rolling window type that does not produce fixed window sizes but instead is faithful to the original amount of data.
Thanks for making this cool clojure library - only discovered it today and look forward to playing with it. I was wondeirng whether this library already supports or plans to...
When trying to write a parquet file with :decimal types, I receive the following error: ``` ; Execution error at tech.v3.libs.parquet/column->field (parquet.clj:886). ; Unsupported datatype for parquet writing: :decimal ```...
I am saving datasets as transit-json. When I create a dataset with a zoned-date-time column and then convert the type to instant and try to save this dataset as transit-json,...
When transforming the categorical columns of a datset to numeric via ds/categorical->number the attached categorical map contains a field "result-datatype". (and it can be specified manaully as well) This is...