sensAI
sensAI copied to clipboard
"Vectorize" ColumnGeneratorCachedByIndex
The ColumnGeneratorCachedByIndex
is recommended for new cached column generators, but it can be significantly slower than the not-recommended way of first creating a ColumnGenerator and then adding cache by wrapping with IndexCachedColumnGenerator
.
The reason is that IndexCachedColumnGenerator
will find all non-cached values and then process them at once (i.e., batch-wise), whereas the ColumnGeneratorCachedByIndex
will always loop through all values. Thus, for an initial filling of the cache this can be much slower.
Not sure what to do here - one would need to redesign the ColumnGeneratorCachedByIndex
to not use _generate_value
, but that's a breaking change. Another way would be to write a new class a la VectorizedColumnGeneratorCachedByIndex
, but I honestly feel like batch-wise processing of missing values should be the default behavior