mlpack icon indicating copy to clipboard operation
mlpack copied to clipboard

Add checks for relative input shapes inside bindings.

Open NippunSharma opened this issue 4 years ago β€’ 25 comments

What is the desired addition or change?

A parameter check should be added to all bindings that checks for the correct relative input shapes. For example the training data and training labels must have the same number of samples, testing data and training data must have the same number of features etc.

What is the motivation for this feature?

Refer to #2818 for more information and how to reproduce the bug.

If applicable, describe how this feature would be implemented.

  • As discussed with @rcurtin the checks should be implemented inside src/mlpack/core/util/param_check.hpp and then we should call these checks from the bindings.

  • You can take a look at how RequireOnlyOnePassed is used the _main.cpp files for each method in the mlpackMain() function and is implemented in src/mlpack/core/util/param_check_impl.hpp. These checks would also be implemented and used similarly.

  • We need to check each binding (listed below) separately and add these checks if they are not already there. For example in Linear SVM same number of columns between training data and labels are already checked here.

  • [ ] adaboost

  • [ ] decision_stump

  • [ ] decision_tree

  • [ ] hoeffding_tree

  • [ ] linear_svm

  • [ ] logistic_regression

  • [ ] nbc

  • [ ] perceptron

  • [ ] random_forest

  • [ ] softmax_regression

  • [ ] bayesian_linear_regression

  • [ ] lars

  • [ ] linear_regression

  • [ ] dbscan

  • [ ] gmm_train

  • [ ] gmm_generate

  • [ ] gmm_probability

  • [ ] kmeans

  • [ ] mean_shift

  • [ ] approx_kfn

  • [ ] emst

  • [ ] fastmks

  • [ ] lsh

  • [ ] knn

  • [ ] kfn

  • [ ] krann

  • [ ] preprocess_split

  • [ ] preprocess_binarize

  • [ ] preprocess_describe

  • [ ] preprocess_scale

  • [ ] preprocess_one_hot_encoding

  • [ ] image_converter

  • [ ] cf

  • [ ] det

  • [ ] hmm_train

  • [ ] hmm_loglik

  • [ ] hmm_viterbi

  • [ ] hmm_generate

  • [ ] kde

  • [ ] nmf

  • [ ] kernel_pca

  • [ ] lmnn

  • [ ] local_coordinate_coding

  • [ ] nca

  • [ ] pca

  • [ ] radical

  • [ ] sparse_coding

Note that not all of the bindings written above will require these checks because for some they would be there already. So it would be better to check this before changing any code (please comment below after checking so that I can update the description to indicate where the checks are not required).

Additional information?

If you can think of a better way to implement it then you can comment below and we can discuss it.

Update: Also replace the occurences where we are comparing sizes of matrices with utilities created in #2370.

NippunSharma avatar Jan 29 '21 09:01 NippunSharma

I have changed the description above. Please check it again if you have already started working on this. Sorry for any inconvenience caused.

NippunSharma avatar Feb 05 '21 05:02 NippunSharma

Hey, I would check

  • [ ] adaboost

  • [ ] decision_stump

  • [ ] decision_tree

  • [ ] hoeffding_tree

if these points are still free. I consider it a good starting point to dive into the project. Would take me about a week.

nklskyoy avatar Feb 14 '21 18:02 nklskyoy

Hi @onikolskyy , sure feel free. Remember that you will also have to implement the checking functions inside param_check.hpp and then add to the bindings (if we are not already checking for the same conditions in some way or another). You can post any questions you have over here.

NippunSharma avatar Feb 14 '21 18:02 NippunSharma

Is there anyone working on this issue?? I would like to work this to get started contributing. We have add write declaration in param_check.hpp and add the definition in param_check_impl.hpp, right?

shubhamabhang77 avatar Feb 19 '21 10:02 shubhamabhang77

@shubhamabhang77 mate, I am working on the adaboost bug #2818 right now , would you mind taking one of the other points? I will report my status later today/tomorrow

nklskyoy avatar Feb 21 '21 17:02 nklskyoy

@NippunSharma I have an issue with setting up a dev environment for the bindings. I have uploaded a question, perhaps there is a quick answer? #2843

nklskyoy avatar Feb 21 '21 20:02 nklskyoy

yeah yeah sure. I will work on other points. You work on your points.

shubhamabhang77 avatar Feb 22 '21 07:02 shubhamabhang77

Hey guys, I would be happy to make my first commit with a fix of #2818. However, I get a permission denied for my github acc when I try to push a new branch with the commit. Do I need some extra rights or does the contribution pipeline work differently?

nklskyoy avatar Feb 23 '21 14:02 nklskyoy

You cannot push directly to the mlpack master, you will not have the access. You will have to create a fork of this repo and then push to your fork and then create a pull request. You can look up on the net, the process is very easy and will not take much time.

NippunSharma avatar Feb 23 '21 14:02 NippunSharma

Hi, Is there anyone currently working on this issue? I would like to work on the above issue.

roshanjain77 avatar Mar 04 '21 07:03 roshanjain77

Two updates relevant to this issue:

  • #2370 contains some nice utilities for checking the size of a matrix. We should probably use those in whatever we build here.
  • This comment tries to find all the existing places in the codebase where we are comparing sizes of matrices. All those usages should be replaced with the utilities being written here (if it is in a binding), or with the functions from #2370 (if not in a binding). Before closing this issue, we should be sure to check that that grep comes up empty. :)

rcurtin avatar Mar 07 '21 19:03 rcurtin

I would like to work on this issue. I have looked through the comments and understood the problem. Where in the codebase should I look for functions that might need to check for the dimensions of matrices?

As in where should I start off and what approach should I follow?

Rishabhc711 avatar Mar 22 '21 17:03 Rishabhc711

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:

mlpack-bot[bot] avatar Apr 21 '21 18:04 mlpack-bot[bot]

Keep open.

NippunSharma avatar Apr 21 '21 20:04 NippunSharma

Hey, @rcurtin @NippunSharma I assumed that this hasn't been fixed since the last comment is from a month ago so I thought I'd take a look at it. I was thinking of replacing all the occurrences of size and dimensionality checks using the functions added in #2370. I was taking a look at logistic regression and there was an instance of code that seemed similar to the function but can't be replaced directly by it - Logistic Regression Code instance. Would it be better to rewrite it slightly or keep it as it is? Thanks!

PranavReddyP16 avatar May 13 '21 20:05 PranavReddyP16

Hey @PranavReddyP16, thanks for showing interest in this issue. As you can read from the description we only need to include checks where they are not present. There is no need to replace the "if" conditions and other occurrences, at least for this issue. For better understanding, you can take a look at #2818, which is the actual issue we are facing. Let me know if you have any other doubts πŸ‘

NippunSharma avatar May 16 '21 20:05 NippunSharma

Two updates relevant to this issue:

  • #2370 contains some nice utilities for checking the size of a matrix. We should probably use those in whatever we build here.
  • This comment tries to find all the existing places in the codebase where we are comparing sizes of matrices. All those usages should be replaced with the utilities being written here (if it is in a binding), or with the functions from #2370 (if not in a binding). Before closing this issue, we should be sure to check that that grep comes up empty. :)

Hey, @NippunSharma thanks for the clarification. I was a little confused because of the second point of this comment from Ryan and I thought we had to replace if statements and add checks where needed. I'll get to work on adding check functions if they're not present. Just to confirm, I can use the functions implemented in #2370 in the (method)_main.cpp files in the mlpackMain() function am I right? Thanks again!

PranavReddyP16 avatar May 16 '21 21:05 PranavReddyP16

Ya, you are right. I am sorry for the confusion here. I forgot to update the issue description, I guess that is why it slipped my mind. I have added that now, thanks for reminding πŸ‘. Just to clarify again, you will have to replace all other occurrences along with the checks. Yes you have to use functions implemented in #2370.You can create a PR and we can discuss any other doubts you have over there πŸ‘

NippunSharma avatar May 17 '21 01:05 NippunSharma

Cool, thanks a lot for clearing that up. I'll get started on it.

PranavReddyP16 avatar May 17 '21 09:05 PranavReddyP16

I would like to work on this

ayushi19031 avatar Jun 20 '21 07:06 ayushi19031

@ayushi19031, go ahead. I think @onikolskyy @PranavReddyP16 were working on this but it has been quite some time, best to ask them first if they are fine with itπŸ‘

NippunSharma avatar Jun 20 '21 07:06 NippunSharma

HI @ayushi19031, I haven't worked on this since my last contribution. Please feel free to continue

nklskyoy avatar Jun 21 '21 16:06 nklskyoy

Is this issue still open?

VedangAsgaonkar avatar Jan 13 '22 10:01 VedangAsgaonkar

@VedangAsgaonkar yes, as you can see, the issue is not closed, so it is still open. :smile:

Please feel free to take a look at some of the PRs that have already been done for other bindings (you can look through the list of linked PRs), then pick a binding and open a PR for it. I'd recommend smaller PRs, one per binding, which will make it easier to get them merged. I think the description that @NippunSharma left above is sufficient to get started. :+1:

rcurtin avatar Jan 13 '22 16:01 rcurtin

Hi ! I would like to take up PCA first and then move on to preprocess_split and other bindings.

eshaanagarwal avatar Jan 19 '22 14:01 eshaanagarwal

I would like to work on this issue.

AdarshSantoria avatar Dec 21 '22 17:12 AdarshSantoria

@AdarshSantoria feel free, if you open a PR it will be reviewed when possible. :+1:

rcurtin avatar Dec 23 '22 15:12 rcurtin

Hi, I have replaced all the if checks in the Linear Regression method with the utility introduced in #2370. I have made a PR (#3640) looking forward to it being merged. I also plan to do this with other methods after this one is merged successfully.

lumi232 avatar Feb 27 '24 09:02 lumi232

Hello, I have only recently started with open source contributions, and I identified this as a good first issue for me to tackle. I would like to address this issue as specifically for PCA, NCA, Kernel-PCA and Perceptrons as a starting point. I am currently going through the codebase of mlpack to get an understanding of the code and dealing with the issue and will update once I start working on the coding part (expected in 1-2 days). So far, as mentioned in the above discussion, I believe using methods from size_checks.hpp will be the way forward.

sukjingitsit avatar Feb 27 '24 18:02 sukjingitsit