ehrapy
ehrapy copied to clipboard
sparse encoding
Description of feature
I looked a bit into sparse encoding. One-hot encoding being the most important:
- scikit-learn's one-hot encoding supports a
sparse_output
parameter that should return a CSR matrix. - We're getting
original_values
as numpy arrays when calling the function. May or may not be fine. - Currently we default the
sparse_output
parameter toFalse
without checking the type of matrix. - The
_update_encoded_data
does not take sparse matrices into account