skglm icon indicating copy to clipboard operation
skglm copied to clipboard

FEAT - Sample weights

Open sujay-pandit opened this issue 1 year ago • 15 comments

Description of the feature

Sklearn 'fit' functions have an option to pass 'sample_weights' array to assign importance to samples and modify the cost function accordingly.

Additional context Wanted to use 'https://github.com/SteiMi/denseweight' with MCP.

sujay-pandit avatar Feb 13 '24 19:02 sujay-pandit

Hi @sujay-pandit, thanks for reaching out.

At first sight it should be doable to add a sample_weights attribute to a datafit, say Quadratic, and to take it into account:

  • when computing the datafit value
  • when computing the datafit gradient
  • when computing the lipschitz constant

The calculation should simply amount to appropriate mutliplications by sample_weights[i]

Would you like to send a Pull request to start working on it? We can guide you and help you through the process

mathurinm avatar Feb 14 '24 07:02 mathurinm

Yes, I can take that up. Thanks! Can you start a branch for me?

sujay-pandit avatar Feb 15 '24 17:02 sujay-pandit

If I start the branch, you won't be able to push to it. You need to fork the repository, and push to a branch in your fork, and then launch a pull request here. see https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/creating-a-pull-request-from-a-fork

ChatGPT may also help on that matter :)

mathurinm avatar Feb 19 '24 08:02 mathurinm

@sujay-pandit scikit-learn has a great explanation for pull requests, you can follow it (adapting the URLs, eg replacing scikit-learn by skglm and others) : https://scikit-learn.org/stable/developers/contributing.html#how-to-contribute

mathurinm avatar Feb 19 '24 09:02 mathurinm

Got it! Thanks @mathurinm

sujay-pandit avatar Feb 20 '24 17:02 sujay-pandit

Any update here? This is a key feature for many problems, so am likewise very interested.

kmedved avatar Mar 08 '24 17:03 kmedved

Pinging @sujay-pandit on this topic

mathurinm avatar Mar 08 '24 17:03 mathurinm

Hey @kmedved and @mathurinm, I did get started on it but need to finish some work for my paper submission in April. I will start working on this as soon as possible.

sujay-pandit avatar Mar 09 '24 00:03 sujay-pandit

Thanks @sujay-pandit If you already started writing some code, can you please open a draft PR, so that other contributors can give early feedback and build on this work?

mathurinm avatar Mar 11 '24 09:03 mathurinm

Hi @mathurinm and @kmedved
Apologies for the delay in communication. I am starting afresh now.
Before I make changes, I wanted to get a confirmation of my plan. For now, my focus is on implementing this for MCPRegression, which I can later extend to others.
I think majority of my changes are going to be in the datafit (Quadratic) class while computing loss and gradient w.r.t to weights. Let me know if the plan below looks ok.

  1. I will add a parameter to MCPRegression constructor called 'sample_weights', which will default to None. [class MCPRegression, estimators.py]

  2. The same parameter will also be added to the solver [AndersonCD, anderson_cd.py]

  3. I feel the majority of my changes are here, in datafit class [class Quadratic, single_task.py]

    Datafit (single_task.py)
    Below are the changes I have in mind. I will scale the loss computation by sample_weights in the value method. And, while computing gradients per weight (w_j) in the gradient_scalar method as follows:

    class Quadratic(BaseDatafit):
        def __init__(self, sample_weights=None):    
            self.sample_weights = sample_weights  
    
        def value(self, y, w, Xw):  
            if self.sample_weights is not None:  
                return np.sum((self.sample_weights * (y - Xw) ** 2)) / (2 * len(y))  
            else:  
                return np.sum((y - Xw) ** 2) / (2 * len(y))  
    
        def gradient_scalar(self, X, y, w, Xw, j):  
            if self.sample_weights is not None:  
                return (X[:, j] * (Xw - y) * self.sample_weights).sum() / len(y)  
            else:  
                return (X[:, j] * (Xw - y)).sum() / len(y)
    

Question: I am not expecting my penalty(MCPenalty/WeightedMCPenalty) classes to change. Is my understanding correct?

Thanks!

sujay-pandit avatar May 10 '24 19:05 sujay-pandit

Hello @sujay-pandit , Thanks a lot for the incoming PR. My plan would be:

  • 1 First, add the sample_weights to the Quadratic class, note that you will also have to change the Lipschitz constant computation (some work has already been started in #245 )
  • 2 Then, before touching the estimator, I would check that the created datafit works a penalty (L1, MCP) with AndersonCD, like done in this example https://contrib.scikit-learn.org/skglm/auto_examples/plot_sparse_recovery.html#sphx-glr-auto-examples-plot-sparse-recovery-py
  • 3 Finally, modify the MCPRegression constructor as you said

IMO sample_weights should not be passed to AndersonCD: sample_weights should be passed to Quadratid, which itself will be passed to the solver.

Question: I am not expecting my penalty(MCPenalty/WeightedMCPenalty) classes to change. Is my understanding correct?

Yes indeed! No penalty should be modified when adding this new feature.

Quick question from my side: for the design, do we want to modify the Quadratic class, or to create a WeightedQuadratic class, which would inherit from Quadratic ? (I would vote for a new class)

QB3 avatar May 10 '24 21:05 QB3

Hi @QB3 Thanks for the prompt feedback!

  • 1 First, add the sample_weights to the Quadratic class, note that you will also have to change the Lipschitz constant computation (some work has already been started in Altered Quadratic Class #245 )

Yes. you are right; for Lipschitz computation, I will have to scale the computed values by sample weights.

  • 2 Then, before touching the estimator, I would check that the created datafit works a penalty (L1, MCP) with AndersonCD, like done in this example https://contrib.scikit-learn.org/skglm/auto_examples/plot_sparse_recovery.html#sphx-glr-auto-examples-plot-sparse-recovery-py

Sounds good

  • 3 Finally, modify the MCPRegression constructor as you said

IMO sample_weights should not be passed to AndersonCD: sample_weights should be passed to Quadratid, which itself will be passed to the solver.

Agreed, that will be cleaner.

Quick question from my side: for the design, do we want to modify the Quadratic class, or to create a WeightedQuadratic class, which would inherit from Quadratic ? (I would vote for a new class)

Are there any more changes planned along with sample weights? I don't think the addition of sample weights is going to need a lot of changes to the existing class. If not, I would vote to just modify the existing class.

sujay-pandit avatar May 14 '24 02:05 sujay-pandit

My take: To support both classic Quadratic and weighted Quadratic in one class I see only two options:

  • have a Quadratic.weights attribute, which is None if no weights are used and a np.array of shape (n_samples,) otherwise. We keep the existing behavior if self.weights is None. This is implemented in #245, but unfortunately I don't think this mixed typing (None or np.array) is supported by numba, see the test failure here : https://github.com/scikit-learn-contrib/skglm/actions/runs/8631223494/job/23660827782?pr=245#step:6:375
  • have a Quadratic.weights attribute, which is np.ones(n_samples) if there are no weights, and a np.array of shape (n_samples,) otherwise. I'm afraid this would slow down the code in the unweighted case; I also don't see how to set the weights to 1 by default (without using a mutable default parameter value, which is dangerous)

So it's probably easier to go for two separate classes, Quadratic and WeightedQuadratic. This should make the PR straightforward.

mathurinm avatar May 14 '24 06:05 mathurinm

@mathurinm and @QB3, In this case, we can go for separate classes. I wasn't aware of this numba issue. I have a question, The changes in #245 and this issue will clash. Can we keep just one of them? Or should I wait until the changes in #245 are complete?

sujay-pandit avatar May 14 '24 15:05 sujay-pandit

#245 is stalled, you can start a separate PR.

You can even start from it, to reuse existing work. To do so first fetch the PR branch, then switch to a new branch:

git fetch [email protected]:scikit-learn-contrib/skglm pull/245/head:nbb
git switch nbb
git switch -c your_desired_branch_name

and start working as usual on a branch.

mathurinm avatar May 14 '24 16:05 mathurinm

@sujay-pandit are you working on this ? Some people asked for it offline so if you have not started already, I will implement it. But if you're working on this i'm happy to give feedback

mathurinm avatar May 28 '24 10:05 mathurinm

@mathurinm I will push my changes by Thursday EOD. If I am not able to I will hand it over to you guys. Apologies for the delay.

sujay-pandit avatar May 28 '24 15:05 sujay-pandit

Hi @mathurinm @kmedved @QB3 I have added a new class 'WeightedQuadratic' in datafits/single_task.py and created a PR. I have tested it with examples/plot_sparse_recovery.py and it seems to work fine (with both all samples weighted equally to 1 and with weights generated using the DenseWeight library). I haven't added the changes to MCPRegression in estimators.py. If these changes are ok then I can move on to them. Waiting for your feedback, Thanks!

sujay-pandit avatar May 31 '24 17:05 sujay-pandit

This was implemented in #258 through a WeightedQuadratic class :rocket:

@sujay-pandit thanks one more time for the work, we look forward to hearing some feedback from you. Please spread the word about skglm too !

mathurinm avatar Jun 24 '24 14:06 mathurinm