pysurvival Make pysurvival work with scikit-learn

I have noticed that PySurvival does not really follow the priniciples of scikit-learn. Starting with the fact that you input X, T, E, instead of X, y. Further GridSearchCV cannot be used because of the aforementioned problem but also because there is no set_params method in the model objects. (also see pipeline of scikit-learn, which only works after extensive reworking of many classes and functions in scikit-learn). This is very unfortunate, I think, that this great package keeps outside of sklearn. Is there any plan to fix this and make PySurvival connectable to scikit-learn? Or am I missing something?

Oct 24 '19 16:10 pransito

FYI, I'm working on a solution to this issue. I expect to have something in a few days.

Jan 28 '20 01:01 bacalfa

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

python -m pip uninstall pysurvival

Then reinstall it:

python setup.py build_ext --inplace (to rebuild the package)
python setup.py install --user (to install the files to your local directories)

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

Jan 29 '20 04:01 bacalfa

Omg thank you so much! hahaha

Jan 29 '20 11:01 camferna

Hi bacalfa,

Beginner coder here. I've been trying to follow your instructions above to install pysurvival on Windows 10. I've tried downloading the zip file and cloning it with git clone. I've also checked to make sure I have MSVC14. Each time I run into the following issue:

c1xx: fatal error C1083: Cannot open source file: 'pysurvival/cpp_extensions/_functions.cpp': No such file or directory

Any advice would be greatly appreciated. Really looking forward to trying this package out!

Mar 21 '20 06:03 JCCKwong

@JCCKwong, can you give more details on the steps you're taking and what happens after you execute them? Also, did you clone my forked repository (https://github.com/bacalfa/pysurvival) instead of the one from the original author (https://github.com/square/pysurvival)?

Mar 21 '20 13:03 bacalfa

@bacalfa yes, I cloned your forked repository. Here's a screenshot of what I did on Anaconda 3.

Issue

Mar 21 '20 17:03 JCCKwong

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in https://github.com/square/pysurvival/issues/15#issuecomment-579584083.

Mar 21 '20 17:03 bacalfa

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in #15 (comment).

It worked, thanks! Really appreciated your help!

Mar 21 '20 17:03 JCCKwong

I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.

If you installed it with setup.py, first uninstall the current version with:

python -m pip uninstall pysurvival

Then reinstall it:

python setup.py build_ext --inplace (to rebuild the package)

python setup.py install --user (to install the files to your local directories)

Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!

First change the current directory to C:\Users\Jethro\pysurvival.

cd C:\Users\Jethro\pysurvival

Then run the python commands described above in #15 (comment). Can it work in StratifiedKFold？

Apr 17 '20 07:04 DashengSong

@DashengSong, have you tried it? I don't think I have.

Apr 17 '20 22:04 bacalfa

Hi @bacalfa . Thanks for creating a package that can be installed on Windows. I'm trying to use the sklearn compatibility feature you've added. Does it work the random survival forest estimator too?

Apr 28 '20 18:04 KaranMehta21

@KaranMehta21, I think it does. But there may be a caveat: https://github.com/square/pysurvival/issues/17.

Apr 28 '20 21:04 bacalfa

@bacalfa OK I'll try it out. Is the benefit of using it to implement cross-validation and hyperparameter tuning and will that lead to higher c-indices? Currently, the RSF model I've trained has a c index of 0.71. I'm looking for ways to increase it closer to 0.80. Any suggestions?

Apr 28 '20 22:04 KaranMehta21

Honestly, I haven't used this package that much, so I'm not sure what to suggest. There are simpler and more complex models. It's a good habit to evaluate performance with a validation set (like in CV) and perform hyperparameter tuning. Difficult to know which algorithm will be the best a priori. So try (and tune) as many as you can, and make sure you make a fair comparison between them.

Apr 28 '20 23:04 bacalfa

Hi All, Would really appreciate if anyone can help me. I have downloaded the package which is at location : C:\Users\User\Downloads\pysurvival-master. For me , I have installed Anaconda at C:\Users\User. I am providing you with the steps that I think I need to follow, please guide so that I can carry out the installation correctly.

Step-1: Create a Directory : C:\Users\User\pysurvival (as Anaconda is installed in C:\Users\User ) Step-2: Copy all contents from C:\Users\User\Downloads\pysurvival-master to C:\Users\User\pysurvival (now setup.py is in this location) Step-3: Navigate to C:\Users\User\pysurvival (using command prompt) Step-4: Run the 2 below commands python setup.py build_ext --inplace (to rebuild the package) python setup.py install --user (to install the files to your local directories)

May 17 '20 07:05 SurajitTest

Hi @bacalfa ,

I've tried your fork with the setup.py

Unfortunatly, still not working for me, because of this line: extra_compile_args = ["/O2"]

Error occuring when: building 'pysurvival.utils._functions' extension

Error: gcc: error: /O2: No such file or directory error: command 'C:\MinGW\bin\gcc.exe' failed with exit status 1

Thanks!

Sep 23 '20 15:09 CoteDave

@CoteDave, I don't have MinGW installed on my Windows machine (and it's not easy to do so). The error seems to suggest that /O2 is an option for the MS C/C++ compiler, which isn't recognized by MinGW. If you change line 61 in setup.py to the same thing as in line 63, I think it'd work. Let me know.

Sep 23 '20 22:09 bacalfa

Hi @bacalfa , changed the line 61.

No more /O2 error, but sadly, a new error occurs at the same place:

building 'pysurvival.utils._functions' extension c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: C:...\Anaconda3\libs/libpython38.dll.a: error adding symbols: file format not recognized collect2.exe: error: ld returned 1 exit status error: command 'C:\MinGW\bin\g++.exe' failed with exit status 1

Sep 24 '20 13:09 CoteDave

@CoteDave, that error looks similar to this one. See the suggestion there.

Sep 24 '20 16:09 bacalfa

I would like as well to make a suggestion. Could you please as well include a Lasso regularization term into the Linear Multi-Task Logistic Regression and Linear SVM Loss Functions in order to be similar to Sklearn to do Ridge, Lasso or ElasticNet regularizations? It will be something like adding a new parameter called "penalizer" such that in line 191 of multi_task.py is written: loss += penalizer*( l2_regtorch.sum(ww)/2. + (1.0-l2_reg)torch.sum(np.sqrt(ww)))

Therefore, if l2_reg=1, one is doing Ridge regularization, if l2_reg=0 one is doing Lasso regularization, and when 0<l2_reg<1 one is doing ElasticNet.

Oct 22 '20 22:10 elopezfune

@elopezfune, regarding your error, see if this helps.

I'll see if I can help with the regularization request and will let you know.

Oct 22 '20 22:10 bacalfa

@elopezfune, I'd prefer to create a branch for this request. Let's call it elastic_net_loss.

For MTLR, I'd do:

loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))

For consistency, I should probably apply the same change to other models. For SVM, that would require modifying Cython code (file _svm.pyx). I'll need more time to make sure I understand what changes to make. Any help is welcome. I'm actually not a user of this package at the moment. Just trying to help maintain it for others. :)

Oct 23 '20 03:10 bacalfa

Thanks for the quick answer. I believe ElasticNet will give the users more flexibility to optimize survival models.

Oct 23 '20 08:10 elopezfune

Yes, a line of code like this is perfect! loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))

I tried once to change it manually on the local files, but I didn't have access to the optimization code (Cython), therefore, it didn't work.

Oct 23 '20 08:10 elopezfune

Well, indeed, there is the need to introduce a new parameter namely penalizer or something like this, which will be the "penalizer" of the model. l2_reg will be to choose between Ridge, Lasso or ElasticNet.

Oct 23 '20 08:10 elopezfune

The following packages contain unfulfilled dependencies:
  python3-dev: Depends: libpython3-dev (= 3.8.2-0ubuntu2) but will not be installed
                Depends: python3.8-dev (> = 3.8.2-1 ~) but will not be installed
E: Unable to correct problems, bad packets are in "keep as is" mode.

What Python version do you have installed? It seems to be suggesting that you should have at least 3.8 to be able to install libpython3-dev.

These errors you're experiencing are specific to your Ubuntu system, not really to pysurvival. Once you have all the dependencies installed, you should be able to build pysurvival.

Oct 23 '20 13:10 bacalfa

I have Python 3.8.6

Oct 23 '20 13:10 elopezfune

You'll have to do some searching on the errors you're getting. I can't reproduce it because I currently don't have access to Ubuntu. See this.

Oct 23 '20 13:10 bacalfa

Thanks, I m trying to solve this problem that it is driving me crazy

Oct 23 '20 14:10 elopezfune

Adding support for l1 regularization to SVM isn't trivial. It requires modifications to Cython code (doable), but I can't find the reference for the formulation. And I don't have a lot of time to spend on this. If anyone would like to contribute or help, that'd be appreciated. SVM in this package doesn't use PyTorch (loss, gradient, and Hessian are manually implemented in Cython, so it's important to know the full formulation in order to modify it).

Oct 24 '20 03:10 bacalfa

pysurvival pysurvival copied to clipboard

Make pysurvival work with scikit-learn

pysurvival
pysurvival copied to clipboard