gil icon indicating copy to clipboard operation
gil copied to clipboard

Added implementation for KMeans algorithm in GIL

Open Sayan-Chaudhuri opened this issue 3 years ago • 20 comments

This is an implementation of Kmeans Clustering algorithm for Boost GIL. The input image filename(absolute/relative) must be passed as a command line argument.During execute ,input the required number of iterations and K value that indicates number of clusters. Output is stored in the file named output-kmeans.tif I have attached the example output of my implementation for the input image frog.jpg(attached),and number of clusters and iterations being 10 and 1 respectively. I would request the community to kindly provide feedback for my implementation I would also like to use the implementation of this algorithm for my gil competency test for GSOC 2021.

Description

References

Tasklist

  • [ ] Add test case(s)
  • [ ] Ensure all CI builds pass
  • [ ] Review and approve

Sayan-Chaudhuri avatar Mar 26 '21 19:03 Sayan-Chaudhuri

This algorithm is used everywhere so shouldn't this be introduced as a usable functionality of the library instead of an example?

lpranam avatar Mar 28 '21 08:03 lpranam

I had thought that first my implementation will be checked whether its working correctly or not and so I did it in the form of an example. But if you suggest, I shall make it in the form of a usable functionality and then raise a pull request. I am also removing the output files following your suggestion.

Sayan-Chaudhuri avatar Mar 28 '21 09:03 Sayan-Chaudhuri

First example and then the actual implementation just increases the amount of work reviewers and you will have to do. Let's just review the final thing.

lpranam avatar Mar 28 '21 10:03 lpranam

Codecov Report

Merging #587 (ca9e727) into develop (6e91e4b) will increase coverage by 0.13%. The diff coverage is n/a.

:exclamation: Current head ca9e727 differs from pull request most recent head e823ebd. Consider uploading reports for the commit e823ebd to get more accurate results

@@             Coverage Diff             @@
##           develop     #587      +/-   ##
===========================================
+ Coverage    78.59%   78.72%   +0.13%     
===========================================
  Files          117      118       +1     
  Lines         5003     5034      +31     
===========================================
+ Hits          3932     3963      +31     
  Misses        1071     1071              

codecov[bot] avatar Mar 28 '21 20:03 codecov[bot]

@lpranam I have made the necessary changes as suggested .

Sayan-Chaudhuri avatar Mar 29 '21 09:03 Sayan-Chaudhuri

new functionality also needs tests...

lpranam avatar Mar 29 '21 10:03 lpranam

@lpranam Ok,I shall soon make the changes and update here

Sayan-Chaudhuri avatar Mar 29 '21 20:03 Sayan-Chaudhuri

@lpranam Is it ok if for the test file I keep a static dataset? I saw the test files of other algorithms where they have fixed the input and expected output data like the pixel value of images

Sayan-Chaudhuri avatar Mar 30 '21 10:03 Sayan-Chaudhuri

@Sayan-Chaudhuri yes, as far as it covers the cases it is okay.

lpranam avatar Mar 30 '21 10:03 lpranam

@lpranam I wish to upload the test file along with the KMeans implementation but I want to clarify certain things.

  1. I have used a single dataset generated with the help of make_blobs() function in python. I have tested it against the existing implementations of kmeans with random centre intialization like in sklearn and OPENCV. So, for benchmarking I have used the silhoutte score obtained using those implementations on that dataset.For a set of 20 runs of the algorithm using different centre initializations, I obtain a silhoutte score of above 0.8 for 90% of the runs .Using the existing implementations in sklearn and OPENCV , this score has come to be 0.83 with the same number of runs. Is it OK If I submit my implementation then?
  2. I have implemented the entire silhoutte score algorithm while testing my implementation. Silhoutte score is very common in clustering techniques and Boost does not have a separate implementation for that. So should I make a separate header file for the Silhoutte score implementation so that it can be reviewed and later merged with the library if deemed ok? I have already googled and I found no api for Silhoutte score calculation in boost.

If you can kindly find time to clarify these doubts, I will be highly grateful.

Sayan-Chaudhuri avatar Mar 31 '21 18:03 Sayan-Chaudhuri

Upon your clarification, I shall push the files accordingly

Sayan-Chaudhuri avatar Mar 31 '21 19:03 Sayan-Chaudhuri

you should upload it if, it does not fit then obviously can be removed but need to have look.

lpranam avatar Apr 01 '21 04:04 lpranam

can also you PR must pass all the CI checks

lpranam avatar Apr 01 '21 04:04 lpranam

Added a new header file for Kmeans

Sayan-Chaudhuri avatar Apr 01 '21 07:04 Sayan-Chaudhuri

@lpranam

Sayan-Chaudhuri avatar Apr 01 '21 07:04 Sayan-Chaudhuri

@lpranam I have also added the test file

Sayan-Chaudhuri avatar Apr 01 '21 08:04 Sayan-Chaudhuri

Hi @Sayan-Chaudhuri , I had a look at this build which contains the following message

error: toolset gcc initialization: error: version '8' requested but 'g++-8' not found and version '7.5.0' of default 'g++' does not match>

I don't think this is related to any changes made by you in this PR, you should probably update this branch with latest develop branch of boost Gil. Pushing again after updating should probably solve this issue.

PS : I encountered a similar error here

meshtag avatar May 20 '21 06:05 meshtag

@meshtag how to do so what you have mentioned?

Sayan-Chaudhuri avatar May 23 '21 14:05 Sayan-Chaudhuri

There are a couple of ways to do this, you can look here to understand some common ones.

meshtag avatar May 23 '21 14:05 meshtag

@Sayan-Chaudhuri

how to do so what you have mentioned?

Please, read the CONTRIBUTING.md on updating your PR. Updating/Syncing PR on GitHub is a common operation, so it is very well documented on the web, GitHub docs, StackOverflow.

Not to mention the lazy button-based way recently offered by GitHub https://github.blog/changelog/2021-05-06-sync-an-out-of-date-branch-of-a-fork-from-the-web/

mloskot avatar May 23 '21 18:05 mloskot