gitea icon indicating copy to clipboard operation
gitea copied to clipboard

Respect Co-authored-by trailers in the contributors graph

Open kemzeb opened this issue 1 year ago • 9 comments

Resolves #29183.

Added an automated test (and refactored the test code around it) and also manually tested the Contributors, Commit Frequency, and Recent Commits pages in a few small repos and the output appears to be fine.

kemzeb avatar May 09 '24 20:05 kemzeb

Now that I am looking at into it further, I think Pulse may need to account for co-authors too. I think this function is the one that needs updating

kemzeb avatar May 09 '24 20:05 kemzeb

Hmm I'm not sure how common it is, but should I account for duplicate Co-authored-by entries i.e. entries that have the same email in a single commit?

kemzeb avatar May 09 '24 21:05 kemzeb

Recent change made the co-author parsing logic more robust to inputs where the co-author's name may include a '<'.

I manually tested all the code paths for parseCoAuthorTrailerValue() and they appear fine. Even though it is unexported, should I just include unit tests for it since the other test I added doesn't cover all its paths?

kemzeb avatar May 11 '24 23:05 kemzeb

Hmm I'm not sure how common it is, but should I account for duplicate Co-authored-by entries i.e. entries that have the same email in a single commit?

I guess duplicates (by email) should count as one.

silverwind avatar May 13 '24 21:05 silverwind

Recent change made the co-author parsing logic more robust to inputs where the co-author's name may include a '<'.

I manually tested all the code paths for parseCoAuthorTrailerValue() and they appear fine. Even though it is unexported, should I just include unit tests for it since the other test I added doesn't cover all its paths?

I'm reminded of a recent vulnerability GitHub had in their commit message parser surrounding these meta lines. Hope we don't replicate it, maybe you can double-check and add a test.

silverwind avatar May 13 '24 21:05 silverwind

BTW is it feasible to move all commit message parsing into a shared function? As I see it, parsing these Co-Authored-By lines is not specific to Commit graph and could be useful elsewhere too.

silverwind avatar May 13 '24 21:05 silverwind

Recent changes exports the trailer parsing logic and adds adds parameterized tests for it. We now also handle duplicated Co-authored-by entries (i.e. entries that have the same email).

kemzeb avatar May 14 '24 06:05 kemzeb

I'm reminded of a recent vulnerability GitHub had in their commit message parser surrounding these meta lines. Hope we don't replicate it, maybe you can double-check and add a test.

Nice read! It appears it is caused by their parser's use of regex, where it iterates a commit's metadata lines looking for its first match with a line that starts with author. From what I understand the main problem is that you can author a commit without a name (while working in a Codespaces environment) where the parser would mistakenly not catch; it would then continue iterating over the commit metadata and eventually find a false author line that was added by the attacker.

I have included some tests for my parser and I'm open to include more possible inputs.

kemzeb avatar May 14 '24 07:05 kemzeb

I realized I need to account for the case where when we are iterating over the trailers we encounter co-author email = commit author email. The current implementation would count this as 2 commits for the commit author. Will make the necessary changes for this and address thte PR feedback as soon as I get the time.

kemzeb avatar Oct 09 '24 18:10 kemzeb