mlpack
                                
                                 mlpack copied to clipboard
                                
                                    mlpack copied to clipboard
                            
                            
                            
                        Add checks for relative input shapes inside bindings.
What is the desired addition or change?
A parameter check should be added to all bindings that checks for the correct relative input shapes. For example the training data and training labels must have the same number of samples, testing data and training data must have the same number of features etc.
What is the motivation for this feature?
Refer to #2818 for more information and how to reproduce the bug.
If applicable, describe how this feature would be implemented.
- 
As discussed with @rcurtin the checks should be implemented inside src/mlpack/core/util/param_check.hppand then we should call these checks from the bindings.
- 
You can take a look at how RequireOnlyOnePassedis used the_main.cppfiles for each method in themlpackMain()function and is implemented insrc/mlpack/core/util/param_check_impl.hpp. These checks would also be implemented and used similarly.
- 
We need to check each binding (listed below) separately and add these checks if they are not already there. For example in Linear SVMsame number of columns between training data and labels are already checked here.
- 
[ ] adaboost 
- 
[ ] decision_stump 
- 
[ ] decision_tree 
- 
[ ] hoeffding_tree 
- 
[ ] linear_svm 
- 
[ ] logistic_regression 
- 
[ ] nbc 
- 
[ ] perceptron 
- 
[ ] random_forest 
- 
[ ] softmax_regression 
- 
[ ] bayesian_linear_regression 
- 
[ ] lars 
- 
[ ] linear_regression 
- 
[ ] dbscan 
- 
[ ] gmm_train 
- 
[ ] gmm_generate 
- 
[ ] gmm_probability 
- 
[ ] kmeans 
- 
[ ] mean_shift 
- 
[ ] approx_kfn 
- 
[ ] emst 
- 
[ ] fastmks 
- 
[ ] lsh 
- 
[ ] knn 
- 
[ ] kfn 
- 
[ ] krann 
- 
[ ] preprocess_split 
- 
[ ] preprocess_binarize 
- 
[ ] preprocess_describe 
- 
[ ] preprocess_scale 
- 
[ ] preprocess_one_hot_encoding 
- 
[ ] image_converter 
- 
[ ] cf 
- 
[ ] det 
- 
[ ] hmm_train 
- 
[ ] hmm_loglik 
- 
[ ] hmm_viterbi 
- 
[ ] hmm_generate 
- 
[ ] kde 
- 
[ ] nmf 
- 
[ ] kernel_pca 
- 
[ ] lmnn 
- 
[ ] local_coordinate_coding 
- 
[ ] nca 
- 
[ ] pca 
- 
[ ] radical 
- 
[ ] sparse_coding 
Note that not all of the bindings written above will require these checks because for some they would be there already. So it would be better to check this before changing any code (please comment below after checking so that I can update the description to indicate where the checks are not required).
Additional information?
If you can think of a better way to implement it then you can comment below and we can discuss it.
Update: Also replace the occurences where we are comparing sizes of matrices with utilities created in #2370.
I have changed the description above. Please check it again if you have already started working on this. Sorry for any inconvenience caused.
Hey, I would check
- 
[ ] adaboost 
- 
[ ] decision_stump 
- 
[ ] decision_tree 
- 
[ ] hoeffding_tree 
if these points are still free. I consider it a good starting point to dive into the project. Would take me about a week.
Hi @onikolskyy , sure feel free. Remember that you will also have to implement the checking functions inside param_check.hpp and then add to the bindings (if we are not already checking for the same conditions in some way or another). You can post any questions you have over here.
Is there anyone working on this issue??
I would like to work this to get started contributing.
We have add write declaration in param_check.hpp and add the definition in param_check_impl.hpp, right?
@shubhamabhang77 mate, I am working on the adaboost bug #2818 right now , would you mind taking one of the other points? I will report my status later today/tomorrow
@NippunSharma I have an issue with setting up a dev environment for the bindings. I have uploaded a question, perhaps there is a quick answer? #2843
yeah yeah sure. I will work on other points. You work on your points.
Hey guys,
I would be happy to make my first commit with a fix of  #2818.
However, I get a permission denied for my github acc when I try to push a new branch with the commit.
Do I need some extra rights or does the contribution pipeline work differently?
You cannot push directly to the mlpack master, you will not have the access. You will have to create a fork of this repo and then push to your fork and then create a pull request. You can look up on the net, the process is very easy and will not take much time.
Hi, Is there anyone currently working on this issue? I would like to work on the above issue.
Two updates relevant to this issue:
- #2370 contains some nice utilities for checking the size of a matrix. We should probably use those in whatever we build here.
- This comment tries to find all the existing places in the codebase where we are comparing sizes of matrices. All those usages should be replaced with the utilities being written here (if it is in a binding), or with the functions from #2370 (if not in a binding). Before closing this issue, we should be sure to check that that grep comes up empty. :)
I would like to work on this issue. I have looked through the comments and understood the problem. Where in the codebase should I look for functions that might need to check for the dimensions of matrices?
As in where should I start off and what approach should I follow?
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:
Keep open.
Hey, @rcurtin @NippunSharma I assumed that this hasn't been fixed since the last comment is from a month ago so I thought I'd take a look at it. I was thinking of replacing all the occurrences of size and dimensionality checks using the functions added in #2370. I was taking a look at logistic regression and there was an instance of code that seemed similar to the function but can't be replaced directly by it - Logistic Regression Code instance. Would it be better to rewrite it slightly or keep it as it is? Thanks!
Hey @PranavReddyP16, thanks for showing interest in this issue. As you can read from the description we only need to include checks where they are not present. There is no need to replace the "if" conditions and other occurrences, at least for this issue. For better understanding, you can take a look at #2818, which is the actual issue we are facing. Let me know if you have any other doubts π
Two updates relevant to this issue:
- #2370 contains some nice utilities for checking the size of a matrix. We should probably use those in whatever we build here.
- This comment tries to find all the existing places in the codebase where we are comparing sizes of matrices. All those usages should be replaced with the utilities being written here (if it is in a binding), or with the functions from #2370 (if not in a binding). Before closing this issue, we should be sure to check that that grep comes up empty. :)
Hey, @NippunSharma thanks for the clarification. I was a little confused because of the second point of this comment from Ryan and I thought we had to replace if statements and add checks where needed. I'll get to work on adding check functions if they're not present. Just to confirm, I can use the functions implemented in #2370 in the (method)_main.cpp files in the mlpackMain() function am I right? Thanks again!
Ya, you are right. I am sorry for the confusion here. I forgot to update the issue description, I guess that is why it slipped my mind. I have added that now, thanks for reminding π. Just to clarify again, you will have to replace all other occurrences along with the checks. Yes you have to use functions implemented in #2370.You can create a PR and we can discuss any other doubts you have over there π
Cool, thanks a lot for clearing that up. I'll get started on it.
I would like to work on this
@ayushi19031, go ahead. I think @onikolskyy @PranavReddyP16 were working on this but it has been quite some time, best to ask them first if they are fine with itπ
HI @ayushi19031, I haven't worked on this since my last contribution. Please feel free to continue
Is this issue still open?
@VedangAsgaonkar yes, as you can see, the issue is not closed, so it is still open. :smile:
Please feel free to take a look at some of the PRs that have already been done for other bindings (you can look through the list of linked PRs), then pick a binding and open a PR for it. I'd recommend smaller PRs, one per binding, which will make it easier to get them merged. I think the description that @NippunSharma left above is sufficient to get started. :+1:
Hi ! I would like to take up PCA first and then move on to preprocess_split and other bindings.
I would like to work on this issue.
@AdarshSantoria feel free, if you open a PR it will be reviewed when possible. :+1:
Hi, I have replaced all the if checks in the Linear Regression method with the utility introduced in #2370. I have made a PR (#3640) looking forward to it being merged. I also plan to do this with other methods after this one is merged successfully.
Hello, I have only recently started with open source contributions, and I identified this as a good first issue for me to tackle. I would like to address this issue as specifically for PCA, NCA, Kernel-PCA and Perceptrons as a starting point. I am currently going through the codebase of mlpack to get an understanding of the code and dealing with the issue and will update once I start working on the coding part (expected in 1-2 days). So far, as mentioned in the above discussion, I believe using methods from size_checks.hpp will be the way forward.