ehrapy icon indicating copy to clipboard operation
ehrapy copied to clipboard

sparse encoding

Open Zethson opened this issue 1 year ago • 0 comments

Description of feature

I looked a bit into sparse encoding. One-hot encoding being the most important:

  1. scikit-learn's one-hot encoding supports a sparse_output parameter that should return a CSR matrix.
  2. We're getting original_values as numpy arrays when calling the function. May or may not be fine.
  3. Currently we default the sparse_output parameter to False without checking the type of matrix.
  4. The _update_encoded_data does not take sparse matrices into account

Zethson avatar Feb 03 '24 12:02 Zethson