chemotools icon indicating copy to clipboard operation
chemotools copied to clipboard

Implement variable of importance for projections and selectioc raito

Open paucablop opened this issue 2 years ago • 7 comments

  • [ ] Include variables of importance for projections (only works with PLS like models)
  • [ ] Include selectivity ratio (only works for PLS like models)

paucablop avatar Nov 17 '23 16:11 paucablop

Hi Pau,

It looks like you have this taken care of, but if you are interested in using my code you can take a look here:

https://github.com/mdarmstr/selrpy

https://github.com/mdarmstr/vipy

Not the cleanest code, but I can incorporate if you want something taken off of your todo list.

mdarmstr avatar Dec 07 '23 18:12 mdarmstr

Hi Michael!

Very cool that you have made some implementations of both functions 🤩 it will be very useful. So far, I have only started implementing the scikit-learn interface for the selector.

Since these two variable selection methods need the PLS model, I imagine a selector where the model is passed in as an attribute when the object is instantiated, something similar to 1.13.4. Feature selection using SelectFromModel. Then I think we should add a check that the estimator is of PLS type, and raise a exception otherwise.

I will publish my Selector branch, and maybe once I have figured out the sklearn interfaces, would you be interested in collaborating implementing the mathematics of both selectors, and maybe write some unit testing for them?

paucablop avatar Dec 09 '23 18:12 paucablop

We can definitely pass the variable selection apparatus as an attribute for the Selector branch. I can see how that would work in a pipeline using external validation data. Do you think we should work towards getting it to talk with the cross validation module as well? https://scikit-learn.org/stable/modules/cross_validation.html

mdarmstr avatar Dec 09 '23 18:12 mdarmstr

Yes! absolutely, when we implement the API correctly we will be able to integrate with the CV module too 🤩. There is an example where I used a grid search and CV to find the number of components in the PLS model: https://paucablop.github.io/chemotools/get-started/brewing_regressor.html#training-a-pls-model

paucablop avatar Dec 10 '23 09:12 paucablop

@mdarmstr I have created a branch, we can start implementing there :nerd_face:

paucablop avatar Dec 10 '23 09:12 paucablop

Hi Pau,

Sorry I'm just getting around to this, there were some unexpected tasks that arose. I'll try and begin work sometime this week!

mdarmstr avatar Dec 19 '23 12:12 mdarmstr

Hi Michael, No worries at all :smile:, there is not rush, I am also busy with other tasks at the moment! Most important is to make sure you have fun coding and enjoy the development experience :smile: :smile:

paucablop avatar Dec 20 '23 16:12 paucablop