Eljas Roellin

Results 94 comments of Eljas Roellin

This can absolutely be stretched to coming up with and adding more (well-performing) imputation strategies yes!

Or even preparing larger synthetic datasets or ones which are well known in the imputation literature, and comparing different methods (and new ones) for performance, runtime, memory requirement, failure modes......

At a first glance, daskml does not have a K-neighbors imputer like [sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html). Having this here would be a significant addition. This is a non-trivial effort, and some engineering is...

- Longitudinal behaviour of a continuous variable for different individuals from previous work [here](https://github.com/theislab/ehrdata/blob/502662cc7b054308b3af143b6efdc353a5a2c9e6/notebooks/Quickstart.ipynb)

Thanks @dehall , this is exactly what I wondered! For the Covid datasets (only csv's), I'd indeed be interested if this can be figured out... In any case thanks for...

> @eroell, what do you think? See Phil's comment above, one more thing would be to add a test I suppose If you want to try Phil's comments yourself @farhadmd7...

Try to narrow the discussion of this issue down by mentiong `ep.pp.encode` explicitly. scipy's sparse arrays accept only numeric data, and no string or object dtypes. So `ep.pp.encode` would not...

The last measurement carrying forward imputation strategy can be yet another imputation strategy worth to have at hand. A more detailed discussion on a case study available e.g. [here](https://bmcmedresmethodol.biomedcentral.com/articles/10.1186/s12874-023-02125-x).

This has been on the [time-series branch](https://github.com/theislab/ehrapy/compare/main...feature/time-series#diff-ff3577b4b4d11b2855db409cdb8e2cc7fdaa9089716bbadf1980f9c9efe4246fR29), and we can make it happen now. See the time-series branch, but there is a first implementation of this and a test, ready...

Sounds good to me :+1: