core icon indicating copy to clipboard operation
core copied to clipboard

tensor generic compute functions

Open rcoreilly opened this issue 5 months ago • 2 comments

Per design discussion in: https://github.com/cogentcore/cogent/discussions/324

The tensor.Indexed type provides the universal representation of a homogenous data type throughout all the packages here, from scalar to vector, matrix, and beyond, because it can efficiently represent any kind of element with sufficient flexibility to enable a huge range of computations to be elegantly expressed. The indexes provide a specific view onto the underlying [Tensor] data, applying to the outermost row dimension (with default row-major indexing). For example, sorting and filtering a tensor only requires updating the indexes while doing nothing to the Tensor itself.

The Vectorize function and its variants provide a universal "apply function to tensor data" mechanism (often called a "map" function, but that name is already taken in Go). It takes an N function that determines how many indexes to iterate over (and this function can also do any initialization prior to iterating), a compute function that gets an index and a list of tensors, which is applied to every index, and a varargs list of indexed tensors. It is completely up to the compute function how to interpret the index. There is a Threaded version of this for parallelizable functions, and a GPU version.

All tensor package functions are registered using a global name-to-function map (Funcs), and can be called by name via tensor.Call or tensor.CallOut (creates the appropriate output tensors for you). Standard enumerated functions in stats and metrics have a FuncName method that appends the package name, which is how they are registered and called.

A Table automatically supplies a shared list of row Indexes for its Indexed columns, efficiently allowing all the heterogeneous data columns to be sorted and filtered together.

Added: tmath implements all standard math functions on tensor.Indexed data, including the standard +, -, *, / operators. cosl then calls these functions.

Major cleanup of: stats, which implements a number of different ways of analyzing tensor and table data, including: - cluster implements agglomerative clustering of items based on metric distance / similarity matrix data. - convolve convolves data (e.g., for smoothing). - glm fits a general linear model for one or more dependent variables as a function of one or more independent variables. This encompasses all forms of regression. - histogram bins data into groups and reports the frequency of elements in the bins. - metric computes similarity / distance metrics for comparing two tensors, and associated distance / similarity matrix functions, including PCA and SVD analysis functions that operate on a covariance matrix. - stats provides a set of standard summary statistics on a range of different data types, including basic slices of floats, to tensor and table data. It also includes the ability to extract Groups of values and generate statistics for each group, as in a "pivot table" in a spreadsheet.

rcoreilly avatar Sep 08 '24 23:09 rcoreilly