Sebastian Pölsterl
Sebastian Pölsterl
Feature importances based on node/split statistics are rather flawed (see e.g. [this paper](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-8-25)). Therefore, I'm hesitant to implement this feature. In particular, you can already compute permutation-based feature importance via...
I agree that it would be nice. The reason why it is not included is that sklearn's Cython implementation expects that a split criterion computes an error/impurity per node from...
Currently, `proxy_impurity_improvement` and `impurity_improvement` just return the log-rank test statistic. `min_impurity_decrease` is [used in sklearn's tree builder](https://github.com/scikit-learn/scikit-learn/blob/7ed972193590c2a11839e15db87fa4818089de1a/sklearn/tree/_tree.pyx#L230-L231), the improvement value is the return value of the criterion's `impurity_improvement` function, [set...
This feature has been requested before and I do plan to add in the next release (probably in February). In the meantime, please have a look at https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368 for alternative...
Unfortunately no, I couldn't find the time to implement this yet. Any contributions would appreciated.
Looks perfectly fine to me. As mentioned in https://github.com/sebp/scikit-survival/issues/15#issuecomment-344757368, the only downside is that this way is still subject to the proportional hazards assumption.
I was planning on using the approach proposed by Van Belle et al. Unfortunately, due to other obligations, my time is limited at the moment.
Could you please clarify what you mean by "their length equal to the training dataset"? If you call `predict_survival_function` it will return an array of `StepFunction` instances, which share the...
I guess you are referring to the counting process layout to fit a Cox proportional hazards model with time-dependent variables. Unfortunately, this is currently not supported, because it does require...
The easiest way would be to transform data with time-varying features into a counting process layout such that you have multiple rows corresponding to one subject and each row correspond...