gitea
gitea copied to clipboard
Respect Co-authored-by trailers in the contributors graph
Resolves #29183.
Added an automated test (and refactored the test code around it) and also manually tested the Contributors, Commit Frequency, and Recent Commits pages in a few small repos and the output appears to be fine.
Now that I am looking at into it further, I think Pulse may need to account for co-authors too. I think this function is the one that needs updating
Hmm I'm not sure how common it is, but should I account for duplicate Co-authored-by entries i.e. entries that have the same email in a single commit?
Recent change made the co-author parsing logic more robust to inputs where the co-author's name may include a '<'.
I manually tested all the code paths for parseCoAuthorTrailerValue() and they appear fine. Even though it is unexported, should I just include unit tests for it since the other test I added doesn't cover all its paths?
Hmm I'm not sure how common it is, but should I account for duplicate
Co-authored-byentries i.e. entries that have the same email in a single commit?
I guess duplicates (by email) should count as one.
Recent change made the co-author parsing logic more robust to inputs where the co-author's name may include a '<'.
I manually tested all the code paths for
parseCoAuthorTrailerValue()and they appear fine. Even though it is unexported, should I just include unit tests for it since the other test I added doesn't cover all its paths?
I'm reminded of a recent vulnerability GitHub had in their commit message parser surrounding these meta lines. Hope we don't replicate it, maybe you can double-check and add a test.
BTW is it feasible to move all commit message parsing into a shared function? As I see it, parsing these Co-Authored-By lines is not specific to Commit graph and could be useful elsewhere too.
Recent changes exports the trailer parsing logic and adds adds parameterized tests for it. We now also handle duplicated Co-authored-by entries (i.e. entries that have the same email).
I'm reminded of a recent vulnerability GitHub had in their commit message parser surrounding these meta lines. Hope we don't replicate it, maybe you can double-check and add a test.
Nice read! It appears it is caused by their parser's use of regex, where it iterates a commit's metadata lines looking for its first match with a line that starts with author. From what I understand the main problem is that you can author a commit without a name (while working in a Codespaces environment) where the parser would mistakenly not catch; it would then continue iterating over the commit metadata and eventually find a false author line that was added by the attacker.
I have included some tests for my parser and I'm open to include more possible inputs.
I realized I need to account for the case where when we are iterating over the trailers we encounter co-author email = commit author email. The current implementation would count this as 2 commits for the commit author. Will make the necessary changes for this and address thte PR feedback as soon as I get the time.