project-kb
project-kb copied to clipboard
Add feature to indicate if a commit has a "twin"
It is common, for "important" commits, to be back-ported to other branches; in particular, security fixes can be often found replicated across a number of branches, so this characteristic can be useful to find them.
Open questions:
- since commits might not be identical (the code might differ across branches), we need some measure of similarity that is tolerant to some differences
- the similarity measure needs to be computed efficiently
Could be that Chapter 3 of Ullman's MMDS is what we need?
http://infolab.stanford.edu/~ullman/mmds/book.pdf
Or maybe a less sophisticated approach could be just checking if two commit messages are the same. Not perfect but useful (and it can be replaced with a more advanced method later on).