pysurvival
pysurvival copied to clipboard
Make pysurvival work with scikit-learn
I have noticed that PySurvival does not really follow the priniciples of scikit-learn. Starting with the fact that you input X, T, E, instead of X, y. Further GridSearchCV cannot be used because of the aforementioned problem but also because there is no set_params method in the model objects. (also see pipeline of scikit-learn, which only works after extensive reworking of many classes and functions in scikit-learn). This is very unfortunate, I think, that this great package keeps outside of sklearn. Is there any plan to fix this and make PySurvival connectable to scikit-learn? Or am I missing something?
FYI, I'm working on a solution to this issue. I expect to have something in a few days.
I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.
If you installed it with setup.py, first uninstall the current version with:
-
python -m pip uninstall pysurvival
Then reinstall it:
-
python setup.py build_ext --inplace
(to rebuild the package) -
python setup.py install --user
(to install the files to your local directories)
Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!
Omg thank you so much! hahaha
Hi bacalfa,
Beginner coder here. I've been trying to follow your instructions above to install pysurvival on Windows 10. I've tried downloading the zip file and cloning it with git clone. I've also checked to make sure I have MSVC14. Each time I run into the following issue:
c1xx: fatal error C1083: Cannot open source file: 'pysurvival/cpp_extensions/_functions.cpp': No such file or directory
Any advice would be greatly appreciated. Really looking forward to trying this package out!
@JCCKwong, can you give more details on the steps you're taking and what happens after you execute them? Also, did you clone my forked repository (https://github.com/bacalfa/pysurvival) instead of the one from the original author (https://github.com/square/pysurvival)?
@bacalfa yes, I cloned your forked repository. Here's a screenshot of what I did on Anaconda 3.
First change the current directory to C:\Users\Jethro\pysurvival.
cd C:\Users\Jethro\pysurvival
Then run the python commands described above in https://github.com/square/pysurvival/issues/15#issuecomment-579584083.
First change the current directory to C:\Users\Jethro\pysurvival.
cd C:\Users\Jethro\pysurvival
Then run the python commands described above in #15 (comment).
It worked, thanks! Really appreciated your help!
I'm happy to announce that I think I have a clean solution to this issue. Please pull from master in my forked repository: https://github.com/bacalfa/pysurvival.
If you installed it with setup.py, first uninstall the current version with:
python -m pip uninstall pysurvival
Then reinstall it:
python setup.py build_ext --inplace
(to rebuild the package)python setup.py install --user
(to install the files to your local directories)Make sure to check out the new notebook explaining the new feature. Comments and feedback are welcome!
First change the current directory to C:\Users\Jethro\pysurvival.
cd C:\Users\Jethro\pysurvival
Then run the python commands described above in #15 (comment). Can it work in StratifiedKFold?
@DashengSong, have you tried it? I don't think I have.
Hi @bacalfa . Thanks for creating a package that can be installed on Windows. I'm trying to use the sklearn compatibility feature you've added. Does it work the random survival forest estimator too?
@KaranMehta21, I think it does. But there may be a caveat: https://github.com/square/pysurvival/issues/17.
@bacalfa OK I'll try it out. Is the benefit of using it to implement cross-validation and hyperparameter tuning and will that lead to higher c-indices? Currently, the RSF model I've trained has a c index of 0.71. I'm looking for ways to increase it closer to 0.80. Any suggestions?
Honestly, I haven't used this package that much, so I'm not sure what to suggest. There are simpler and more complex models. It's a good habit to evaluate performance with a validation set (like in CV) and perform hyperparameter tuning. Difficult to know which algorithm will be the best a priori. So try (and tune) as many as you can, and make sure you make a fair comparison between them.
Hi All, Would really appreciate if anyone can help me. I have downloaded the package which is at location : C:\Users\User\Downloads\pysurvival-master. For me , I have installed Anaconda at C:\Users\User. I am providing you with the steps that I think I need to follow, please guide so that I can carry out the installation correctly.
Step-1: Create a Directory : C:\Users\User\pysurvival (as Anaconda is installed in C:\Users\User ) Step-2: Copy all contents from C:\Users\User\Downloads\pysurvival-master to C:\Users\User\pysurvival (now setup.py is in this location) Step-3: Navigate to C:\Users\User\pysurvival (using command prompt) Step-4: Run the 2 below commands python setup.py build_ext --inplace (to rebuild the package) python setup.py install --user (to install the files to your local directories)
Hi @bacalfa ,
I've tried your fork with the setup.py
Unfortunatly, still not working for me, because of this line: extra_compile_args = ["/O2"]
Error occuring when: building 'pysurvival.utils._functions' extension
Error: gcc: error: /O2: No such file or directory error: command 'C:\MinGW\bin\gcc.exe' failed with exit status 1
Thanks!
@CoteDave, I don't have MinGW installed on my Windows machine (and it's not easy to do so). The error seems to suggest that /O2 is an option for the MS C/C++ compiler, which isn't recognized by MinGW. If you change line 61 in setup.py to the same thing as in line 63, I think it'd work. Let me know.
Hi @bacalfa , changed the line 61.
No more /O2 error, but sadly, a new error occurs at the same place:
building 'pysurvival.utils._functions' extension c:/mingw/bin/../lib/gcc/mingw32/9.2.0/../../../../mingw32/bin/ld.exe: C:...\Anaconda3\libs/libpython38.dll.a: error adding symbols: file format not recognized collect2.exe: error: ld returned 1 exit status error: command 'C:\MinGW\bin\g++.exe' failed with exit status 1
@CoteDave, that error looks similar to this one. See the suggestion there.
I would like as well to make a suggestion. Could you please as well include a Lasso regularization term into the Linear Multi-Task Logistic Regression and Linear SVM Loss Functions in order to be similar to Sklearn to do Ridge, Lasso or ElasticNet regularizations? It will be something like adding a new parameter called "penalizer" such that in line 191 of multi_task.py is written: loss += penalizer*( l2_regtorch.sum(ww)/2. + (1.0-l2_reg)torch.sum(np.sqrt(ww)))
Therefore, if l2_reg=1, one is doing Ridge regularization, if l2_reg=0 one is doing Lasso regularization, and when 0<l2_reg<1 one is doing ElasticNet.
@elopezfune, regarding your error, see if this helps.
I'll see if I can help with the regularization request and will let you know.
@elopezfune, I'd prefer to create a branch for this request. Let's call it elastic_net_loss
.
For MTLR, I'd do:
loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))
For consistency, I should probably apply the same change to other models. For SVM, that would require modifying Cython code (file _svm.pyx
). I'll need more time to make sure I understand what changes to make. Any help is welcome. I'm actually not a user of this package at the moment. Just trying to help maintain it for others. :)
Thanks for the quick answer. I believe ElasticNet will give the users more flexibility to optimize survival models.
Yes, a line of code like this is perfect!
loss += l2_reg * torch.sum(w * w) / 2. + (1.0 - l2_reg) * torch.sum(torch.abs(w))
I tried once to change it manually on the local files, but I didn't have access to the optimization code (Cython), therefore, it didn't work.
Well, indeed, there is the need to introduce a new parameter namely penalizer
or something like this, which will be the "penalizer" of the model. l2_reg
will be to choose between Ridge, Lasso or ElasticNet.
The following packages contain unfulfilled dependencies: python3-dev: Depends: libpython3-dev (= 3.8.2-0ubuntu2) but will not be installed Depends: python3.8-dev (> = 3.8.2-1 ~) but will not be installed E: Unable to correct problems, bad packets are in "keep as is" mode.
What Python version do you have installed? It seems to be suggesting that you should have at least 3.8 to be able to install libpython3-dev.
These errors you're experiencing are specific to your Ubuntu system, not really to pysurvival. Once you have all the dependencies installed, you should be able to build pysurvival.
I have Python 3.8.6
You'll have to do some searching on the errors you're getting. I can't reproduce it because I currently don't have access to Ubuntu. See this.
Thanks, I m trying to solve this problem that it is driving me crazy
Adding support for l1 regularization to SVM isn't trivial. It requires modifications to Cython code (doable), but I can't find the reference for the formulation. And I don't have a lot of time to spend on this. If anyone would like to contribute or help, that'd be appreciated. SVM in this package doesn't use PyTorch (loss, gradient, and Hessian are manually implemented in Cython, so it's important to know the full formulation in order to modify it).