DESDEO
DESDEO copied to clipboard
Tensor support for polars evaluator
PR #143 adds support for vector valued variables and constants to polars evaluator, but polars evaluator still cannot handle higher dimensions.
Observed reasons/attempted solutions for this:
- When having the higher dimensional variable values as a single column (Polars Series), the list of lists is represented as Polars Lists in the column. These Polars Lists are not compatible with python lists or numpy arrays, so no calculations can be conducted with them.
- If we were to represent higher dimensional variables as multiple columns (Series) by flattening the variables into vectors, the issue would be, that in the problem formulation, the function expressions have the variables as one column instead of these flattened columns. So how would we parse the expressions in a way that would take these higher dimensions into account?
An example: we have a 2x2 variable X and a 2x2 constant A and we want to get the matrix multiplication between them: A@X.
This will be parsed as the expression column("X").python_udf()
(the python_udf() is basically just a placeholder for the matrix multiplication as polars doesn't have its own function for it, numpy.matmul is used here and it seems to work fine with vectors).
Then, when giving the values of X, if given as a 2x2 matrix, the values are stored in a Polars DataFrame as a Series consisting of a Polars List which is not compatible with python list (of lists) or numpy array representing A. The following error occurs: polars.exceptions.ComputeError: NotImplementedError: conversion of polars data type List to C-type not implemented
.
But, we could store the values of X in multiple columns in the Dataframe, for example, as columns X_1 and X_2, both being vectors. However, giving these columns to the expression by obj_col = agg_df.select(expr.alias(symbol))
will result in an error as the expression does not have column("X_1")
and column("X_2")
but column("X")
.